The Battle of Neighborhoods

1.1 Background

Looking at global tourism trends before the international travel entry restrictions due to the Covid-19 paramedic, tourism in Japan is on a rise, especially Tokyo. Tokyo is the largest metropolitan in the world. It is one of the most visited tourism destinations as it offers many unique experiences.Tokyo is a metropolitan prefecture comprising administrative entities of special wards and municipalities. Almost three-quarters of the population of Tokyo live in the eastern part of Tokyo in what are referred to as the 23 special wards. So, they are considered as the core and the most populous part of Tokyo.Each ward has a distinct character of its own for tourists and travelers to explore.

1.2 Business Problem

As many tourists travel to experience the different culture, different traditions, and gastronomy. It is difficult for tourists to make choices among many options on travel essentials because everyone has their own preferences of where to go and it is all so fragmented that one has to assemble it themselves, especially if one is interested in local/non-touristy recommendations.

Thus, leveraging Foursquare data and machine learning to build a recommendation and segmentation by applying Foursquare API location data, regional clustering of venue information would help to develop a personalized travel planning system to provide users with a travel schedule planning service and to determine what might be the ‘best’ areas for different activities ranging from accommodations, attractions, restaurants, parks and more, in order to ensure that they would have the best promising experience during their stays in Tokyo.

1.3 Target Audiance

  • Ministry of Tourism or travel agency who wants to provide travel guidance for tourists to find the best locations based on their interests.
  • Travelers who make their own plan for vacation. This can help them make an informed decision of where to go by providing an in-depth analysis of the wards and districts.
  • Business Analyst or Data Scientist, who desires to analyze the areas of Tokyo using python, Jupyter notebook and machine learning techniques.

2. Data Requirements

An area that will be analyzed in this project: Tokyo’s special wards.

Factors that will influence the decision:

  • Top 4 attractions in Tokyo
  • Top 10 most common venues of the tourism areas

Data Sources:

3. METHODOLOGY

Before we get the data and start exploring it, let's import all required libraries...

In [59]:
! pip install jupyter-conda
Collecting jupyter-conda
  Downloading https://files.pythonhosted.org/packages/47/de/2a3066f16fee035b2f86293f7184cdd22c9552ee61c8858499b13d73bf6a/jupyter_conda-3.4.1-py2.py3-none-any.whl (58kB)
     |████████████████████████████████| 61kB 15.9MB/s eta 0:00:01
Collecting typing; python_version < "3.7" (from jupyter-conda)
  Downloading https://files.pythonhosted.org/packages/05/d9/6eebe19d46bd05360c9a9aae822e67a80f9242aabbfc58b641b957546607/typing-3.7.4.3.tar.gz (78kB)
     |████████████████████████████████| 81kB 11.4MB/s eta 0:00:01
Collecting packaging (from jupyter-conda)
  Using cached https://files.pythonhosted.org/packages/3e/89/7ea760b4daa42653ece2380531c90f64788d979110a2ab51049d92f408af/packaging-20.9-py2.py3-none-any.whl
Requirement already satisfied: notebook>=4.3.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from jupyter-conda) (6.3.0)
Collecting nb-conda-kernels>=2.2.0 (from jupyter-conda)
  ERROR: Could not find a version that satisfies the requirement nb-conda-kernels>=2.2.0 (from jupyter-conda) (from versions: none)
ERROR: No matching distribution found for nb-conda-kernels>=2.2.0 (from jupyter-conda)
In [1]:
! pip3 install lxml
Collecting lxml
  Downloading lxml-4.6.3-cp38-cp38-manylinux2014_x86_64.whl (6.8 MB)
     |████████████████████████████████| 6.8 MB 18.4 MB/s eta 0:00:01
Installing collected packages: lxml
Successfully installed lxml-4.6.3
In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
! pip install BeautifulSoup4
from bs4 import BeautifulSoup

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install git+git://github.com/geopandas/geopandas.git
import geopandas as gpd
!pip install geoplot
import geoplot as gplt

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

#  libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

# import k-means from clustering stage
from sklearn.cluster import KMeans

import seaborn as sns
from matplotlib import pyplot as plt


print('Folium installed')
print('Libraries imported.')
Collecting BeautifulSoup4
  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
     |████████████████████████████████| 122kB 999kB/s eta 0:00:01
Collecting soupsieve>1.2; python_version >= "3.0" (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/36/69/d82d04022f02733bf9a72bc3b96332d360c0c5307096d76f6bb7489f7e57/soupsieve-2.2.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.3 soupsieve-2.2.1
Collecting git+git://github.com/geopandas/geopandas.git
  Cloning git://github.com/geopandas/geopandas.git to /tmp/pip-req-build-fy8_dx41
  Running command git clone -q git://github.com/geopandas/geopandas.git /tmp/pip-req-build-fy8_dx41
Requirement already satisfied: pandas>=0.24.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geopandas==0.9.0+36.gcb88dd4) (1.1.5)
Collecting shapely>=1.6 (from geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
     |████████████████████████████████| 1.0MB 1.0MB/s eta 0:00:01
Collecting fiona>=1.8 (from geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/ed/75/f0bc3be93d860fae56e7916d062a67b39bf10e7b124361eb353d13116263/Fiona-1.8.20-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (15.4MB)
     |████████████████████████████████| 15.4MB 11.9MB/s eta 0:00:01
Collecting pyproj>=2.2.0 (from geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/53/ef/459d663c95677a63e8f8dd93b46ef89a885bfcf6bc0655b3f17a1566f78c/pyproj-3.0.1-cp36-cp36m-manylinux2010_x86_64.whl (6.5MB)
     |████████████████████████████████| 6.5MB 18.5MB/s eta 0:00:01
Requirement already satisfied: pytz>=2017.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas>=0.24.0->geopandas==0.9.0+36.gcb88dd4) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas>=0.24.0->geopandas==0.9.0+36.gcb88dd4) (2.8.1)
Requirement already satisfied: numpy>=1.15.4 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas>=0.24.0->geopandas==0.9.0+36.gcb88dd4) (1.19.5)
Requirement already satisfied: setuptools in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (52.0.0.post20210125)
Requirement already satisfied: six>=1.7 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (1.16.0)
Collecting cligj>=0.5 (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/73/86/43fa9f15c5b9fb6e82620428827cd3c284aa933431405d1bcf5231ae3d3e/cligj-0.7.2-py3-none-any.whl
Collecting click-plugins>=1.0 (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/e9/da/824b92d9942f4e472702488857914bdd50f73021efea15b4cad9aca8ecef/click_plugins-1.1.1-py2.py3-none-any.whl
Requirement already satisfied: certifi in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (2020.12.5)
Collecting munch (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4)
  Downloading https://files.pythonhosted.org/packages/cc/ab/85d8da5c9a45e072301beb37ad7f833cd344e04c817d97e0cc75681d248f/munch-2.5.0-py2.py3-none-any.whl
Requirement already satisfied: click>=4.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (8.0.1)
Requirement already satisfied: attrs>=17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (21.2.0)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from click>=4.0->fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (4.3.0)
Requirement already satisfied: zipp>=0.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click>=4.0->fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click>=4.0->fiona>=1.8->geopandas==0.9.0+36.gcb88dd4) (3.7.4.3)
Building wheels for collected packages: geopandas
  Building wheel for geopandas (setup.py) ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-1wiaoupw/wheels/91/24/71/376c9c67192694168352afcccc2d264248f7e2cc6192997186
Successfully built geopandas
Installing collected packages: shapely, cligj, click-plugins, munch, fiona, pyproj, geopandas
  Found existing installation: pyproj 1.9.6
    Uninstalling pyproj-1.9.6:
      Successfully uninstalled pyproj-1.9.6
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.20 geopandas-0.9.0+36.gcb88dd4 munch-2.5.0 pyproj-3.0.1 shapely-1.7.1
Collecting geoplot
  Downloading https://files.pythonhosted.org/packages/e1/8f/46133752e1f02e70501939e739b81cbc85c79d7398c963b8a25a3178bffe/geoplot-0.4.1-py3-none-any.whl
Requirement already satisfied: geopandas in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geoplot) (0.9.0+36.gcb88dd4)
Collecting descartes (from geoplot)
  Downloading https://files.pythonhosted.org/packages/e5/b6/1ed2eb03989ae574584664985367ba70cd9cf8b32ee8cad0e8aaeac819f3/descartes-1.1.0-py3-none-any.whl
Collecting contextily>=1.0.0 (from geoplot)
  Downloading https://files.pythonhosted.org/packages/d3/8a/f7916ad000c30b86793a0c7a63946baa413f40f33edb5b10f78a1b150d24/contextily-1.1.0-py3-none-any.whl
Requirement already satisfied: seaborn in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geoplot) (0.9.0)
Collecting mapclassify>=2.1 (from geoplot)
  Downloading https://files.pythonhosted.org/packages/22/8e/d968c0945d41bb02de0efaa92e31e43a817dc52d30e82b4dfdda407a1903/mapclassify-2.4.2-py3-none-any.whl
Collecting cartopy (from geoplot)
  Downloading https://files.pythonhosted.org/packages/ed/ca/524ce33692df3faeaa56852fb6a33b0b410be94cc288417565b96fef3f64/Cartopy-0.19.0.post1.tar.gz (12.1MB)
     |████████████████████████████████| 12.1MB 39.1MB/s eta 0:00:01
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Requirement already satisfied: pandas in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geoplot) (1.1.5)
Requirement already satisfied: matplotlib in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geoplot) (3.3.4)
Requirement already satisfied: pyproj>=2.2.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geopandas->geoplot) (3.0.1)
Requirement already satisfied: shapely>=1.6 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geopandas->geoplot) (1.7.1)
Requirement already satisfied: fiona>=1.8 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from geopandas->geoplot) (1.8.20)
Requirement already satisfied: requests in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from contextily>=1.0.0->geoplot) (2.25.1)
Collecting joblib (from contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/55/85/70c6602b078bd9e6f3da4f467047e906525c355a4dacd4f71b97a35d9897/joblib-1.0.1-py3-none-any.whl (303kB)
     |████████████████████████████████| 307kB 55.3MB/s eta 0:00:01
Collecting geopy (from contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
     |████████████████████████████████| 112kB 61.9MB/s eta 0:00:01
Collecting mercantile (from contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/b2/d6/de0cc74f8d36976aeca0dd2e9cbf711882ff8e177495115fd82459afdc4d/mercantile-1.2.1-py3-none-any.whl
Requirement already satisfied: pillow in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from contextily>=1.0.0->geoplot) (8.2.0)
Collecting rasterio (from contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/5c/6c/f614116e43b3be5d390972331afeb56b8f633dada5cc3eb79ef17d472a87/rasterio-1.2.4-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (19.3MB)
     |████████████████████████████████| 19.3MB 67.0MB/s eta 0:00:01
Requirement already satisfied: scipy>=0.14.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn->geoplot) (1.5.3)
Requirement already satisfied: numpy>=1.9.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from seaborn->geoplot) (1.19.5)
Requirement already satisfied: networkx in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from mapclassify>=2.1->geoplot) (2.5.1)
Requirement already satisfied: scikit-learn in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from mapclassify>=2.1->geoplot) (0.20.1)
Requirement already satisfied: pyshp>=2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from cartopy->geoplot) (2.1.3)
Requirement already satisfied: pytz>=2017.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas->geoplot) (2021.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pandas->geoplot) (2.8.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->geoplot) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from matplotlib->geoplot) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from matplotlib->geoplot) (0.10.0)
Requirement already satisfied: certifi in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from pyproj>=2.2.0->geopandas->geoplot) (2020.12.5)
Requirement already satisfied: setuptools in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (52.0.0.post20210125)
Requirement already satisfied: six>=1.7 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (1.16.0)
Requirement already satisfied: cligj>=0.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (0.7.2)
Requirement already satisfied: click-plugins>=1.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (1.1.1)
Requirement already satisfied: munch in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (2.5.0)
Requirement already satisfied: click>=4.0 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (8.0.1)
Requirement already satisfied: attrs>=17 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from fiona>=1.8->geopandas->geoplot) (21.2.0)
Requirement already satisfied: idna<3,>=2.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->contextily>=1.0.0->geoplot) (2.10)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->contextily>=1.0.0->geoplot) (1.26.5)
Requirement already satisfied: chardet<5,>=3.0.2 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from requests->contextily>=1.0.0->geoplot) (4.0.0)
Collecting geographiclib<2,>=1.49 (from geopy->contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Collecting affine (from rasterio->contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/ac/a6/1a39a1ede71210e3ddaf623982b06ecfc5c5c03741ae659073159184cd3e/affine-2.3.0-py2.py3-none-any.whl
Collecting snuggs>=1.4.1 (from rasterio->contextily>=1.0.0->geoplot)
  Downloading https://files.pythonhosted.org/packages/cc/0e/d27d6e806d6c0d1a2cfdc5d1f088e42339a0a54a09c3343f7f81ec8947ea/snuggs-1.4.7-py3-none-any.whl
Requirement already satisfied: decorator<5,>=4.3 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from networkx->mapclassify>=2.1->geoplot) (4.4.2)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from click>=4.0->fiona>=1.8->geopandas->geoplot) (4.3.0)
Requirement already satisfied: zipp>=0.5 in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click>=4.0->fiona>=1.8->geopandas->geoplot) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /home/jupyterlab/conda/envs/python/lib/python3.6/site-packages (from importlib-metadata; python_version < "3.8"->click>=4.0->fiona>=1.8->geopandas->geoplot) (3.7.4.3)
Building wheels for collected packages: cartopy
  Building wheel for cartopy (PEP 517) ... done
  Stored in directory: /home/jupyterlab/.cache/pip/wheels/45/7f/3b/37879587817fd1bbbee7b47312e2401b4f542cccf2fbe9b4ee
Successfully built cartopy
Installing collected packages: descartes, joblib, geographiclib, geopy, mercantile, affine, snuggs, rasterio, contextily, mapclassify, cartopy, geoplot
Successfully installed affine-2.3.0 cartopy-0.19.0.post1 contextily-1.1.0 descartes-1.1.0 geographiclib-1.50 geoplot-0.4.1 geopy-2.1.0 joblib-1.0.1 mapclassify-2.4.2 mercantile-1.2.1 rasterio-1.2.4 snuggs-1.4.7
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
    certifi-2021.5.30          |   py36h5fab9bb_0         141 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         375 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2021.4.13-~ --> conda-forge::ca-certificates-2021.5.30-ha878542_0
  certifi                          2020.12.5-py36h5fab9bb_1 --> 2021.5.30-py36h5fab9bb_0



Downloading and Extracting Packages
certifi-2021.5.30    | 141 KB    | ##################################### | 100% 
geopy-2.1.0          | 64 KB     | ##################################### | 100% 
ca-certificates-2021 | 136 KB    | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    attrs-21.2.0               |     pyhd8ed1ab_0          44 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    entrypoints-0.3            |  pyhd8ed1ab_1003           8 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    jsonschema-3.2.0           |     pyhd8ed1ab_3          45 KB  conda-forge
    pandas-1.1.5               |   py36h284efc9_0        11.3 MB  conda-forge
    pyrsistent-0.17.3          |   py36h8f6f2f9_2          89 KB  conda-forge
    pytz-2021.1                |     pyhd8ed1ab_0         239 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        12.4 MB

The following NEW packages will be INSTALLED:

  altair             conda-forge/noarch::altair-4.1.0-py_1
  attrs              conda-forge/noarch::attrs-21.2.0-pyhd8ed1ab_0
  branca             conda-forge/noarch::branca-0.4.2-pyhd8ed1ab_0
  entrypoints        conda-forge/noarch::entrypoints-0.3-pyhd8ed1ab_1003
  folium             conda-forge/noarch::folium-0.5.0-py_0
  jsonschema         conda-forge/noarch::jsonschema-3.2.0-pyhd8ed1ab_3
  pandas             conda-forge/linux-64::pandas-1.1.5-py36h284efc9_0
  pyrsistent         conda-forge/linux-64::pyrsistent-0.17.3-py36h8f6f2f9_2
  pytz               conda-forge/noarch::pytz-2021.1-pyhd8ed1ab_0
  vincent            conda-forge/noarch::vincent-0.4.4-py_1



Downloading and Extracting Packages
pyrsistent-0.17.3    | 89 KB     | ##################################### | 100% 
folium-0.5.0         | 45 KB     | ##################################### | 100% 
branca-0.4.2         | 26 KB     | ##################################### | 100% 
altair-4.1.0         | 614 KB    | ##################################### | 100% 
attrs-21.2.0         | 44 KB     | ##################################### | 100% 
pandas-1.1.5         | 11.3 MB   | ##################################### | 100% 
entrypoints-0.3      | 8 KB      | ##################################### | 100% 
jsonschema-3.2.0     | 45 KB     | ##################################### | 100% 
pytz-2021.1          | 239 KB    | ##################################### | 100% 
vincent-0.4.4        | 28 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Folium installed
Libraries imported.

Data Preparation

- Web Scraping

Use pandas to transform the data in the table on the Wikipedia page into a dataframe.

In [3]:
url = "https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards"
In [4]:
source = requests.get(url).text
soup = BeautifulSoup(source, 'html5lib')

#find all html tables in the web page
tables = soup.find_all('table') # in html table is represented by the tag <table>
In [5]:
# we can see how many tables were found by checking the length of the tables list
len(tables)
Out[5]:
10
In [13]:
#print(tables[index].prettify())

Scrape data from HTML table into a DataFrame using BeautifulSoup and read_html

In [15]:
tokyo_data = pd.read_html(str(tables[3]), flavor='bs4')
In [16]:
tokyo_data
Out[16]:
[        No.     Flag        Name    Kanji  Population(as of October 2016)  \
 0        01      NaN     Chiyoda     千代田区                           59441   
 1        02      NaN        Chūō      中央区                          147620   
 2        03      NaN      Minato       港区                          248071   
 3        04      NaN    Shinjuku      新宿区                          339211   
 4        05      NaN      Bunkyō      文京区                          223389   
 5        06      NaN       Taitō      台東区                          200486   
 6        07      NaN      Sumida      墨田区                          260358   
 7        08      NaN        Kōtō      江東区                          502579   
 8        09      NaN   Shinagawa      品川区                          392492   
 9        10      NaN      Meguro      目黒区                          280283   
 10       11      NaN         Ōta      大田区                          722608   
 11       12      NaN    Setagaya     世田谷区                          910868   
 12       13      NaN     Shibuya      渋谷区                          227850   
 13       14      NaN      Nakano      中野区                          332902   
 14       15      NaN    Suginami      杉並区                          570483   
 15       16      NaN     Toshima      豊島区                          294673   
 16       17      NaN        Kita       北区                          345063   
 17       18      NaN     Arakawa      荒川区                          213648   
 18       19      NaN    Itabashi      板橋区                          569225   
 19       20      NaN      Nerima      練馬区                          726748   
 20       21      NaN      Adachi      足立区                          674067   
 21       22      NaN  Katsushika      葛飾区                          447140   
 22       23      NaN     Edogawa     江戸川区                          685899   
 23  Overall  Overall     Overall  Overall                         9375104   
 
     Density(/km2)  Area(km2)  \
 0            5100      11.66   
 1           14460      10.21   
 2           12180      20.37   
 3           18620      18.22   
 4           19790      11.29   
 5           19830      10.11   
 6           18910      13.77   
 7           12510      40.16   
 8           17180      22.84   
 9           19110      14.67   
 10          11910      60.66   
 11          15690      58.05   
 12          15080      15.11   
 13          21350      15.59   
 14          16750      34.06   
 15          22650      13.01   
 16          16740      20.61   
 17          21030      10.16   
 18          17670      32.22   
 19          15120      48.08   
 20          12660      53.25   
 21          12850      34.80   
 22          13750      49.90   
 23          15146     619.00   
 
                                       Major districts  
 0   Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,...  
 1   Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb...  
 2   Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong...  
 3   Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich...  
 4                               Hongō, Yayoi, Hakusan  
 5                                       Ueno, Asakusa  
 6                       Kinshichō, Morishita, Ryōgoku  
 7   Kiba, Ariake, Kameido, Tōyōchō, Monzennakachō,...  
 8   Shinagawa, Gotanda, Ōsaki, Hatanodai, Ōimachi,...  
 9      Meguro, Nakameguro, Jiyugaoka, Komaba, Aobadai  
 10                Ōmori, Kamata, Haneda, Den-en-chōfu  
 11        Shimokitazawa, Kinuta, Karasuyama, Tamagawa  
 12        Shibuya, Ebisu, Harajuku, Daikanyama, Hiroo  
 13                                             Nakano  
 14                           Kōenji, Asagaya, Ogikubo  
 15               Ikebukuro, Komagome, Senkawa, Sugamo  
 16                               Akabane, Ōji, Tabata  
 17             Arakawa, Machiya, Nippori, Minamisenju  
 18                           Itabashi, Takashimadaira  
 19                        Nerima, Ōizumi, Hikarigaoka  
 20                      Ayase, Kitasenju, Takenotsuka  
 21                 Tateishi, Aoto, Kameari, Shibamata  
 22                                       Kasai, Koiwa  
 23                                                NaN  ]
In [20]:
#Create a dataframe with table

tokyo_wards = tokyo_data[0] #pd.read_html(str(tables[3]), flavor='bs4')[0]

tokyo_wards = tokyo_wards.rename(columns = {tokyo_wards.columns[2] : 'Ward', tokyo_wards.columns[-2] : 'Area', tokyo_wards.columns[-3] : 'Density', tokyo_wards.columns[-4] : 'Population'} )

tokyo_wards#.tail()
Out[20]:
No. Flag Ward Kanji Population Density Area Major districts
0 01 NaN Chiyoda 千代田区 59441 5100 11.66 Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,...
1 02 NaN Chūō 中央区 147620 14460 10.21 Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb...
2 03 NaN Minato 港区 248071 12180 20.37 Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong...
3 04 NaN Shinjuku 新宿区 339211 18620 18.22 Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich...
4 05 NaN Bunkyō 文京区 223389 19790 11.29 Hongō, Yayoi, Hakusan
5 06 NaN Taitō 台東区 200486 19830 10.11 Ueno, Asakusa
6 07 NaN Sumida 墨田区 260358 18910 13.77 Kinshichō, Morishita, Ryōgoku
7 08 NaN Kōtō 江東区 502579 12510 40.16 Kiba, Ariake, Kameido, Tōyōchō, Monzennakachō,...
8 09 NaN Shinagawa 品川区 392492 17180 22.84 Shinagawa, Gotanda, Ōsaki, Hatanodai, Ōimachi,...
9 10 NaN Meguro 目黒区 280283 19110 14.67 Meguro, Nakameguro, Jiyugaoka, Komaba, Aobadai
10 11 NaN Ōta 大田区 722608 11910 60.66 Ōmori, Kamata, Haneda, Den-en-chōfu
11 12 NaN Setagaya 世田谷区 910868 15690 58.05 Shimokitazawa, Kinuta, Karasuyama, Tamagawa
12 13 NaN Shibuya 渋谷区 227850 15080 15.11 Shibuya, Ebisu, Harajuku, Daikanyama, Hiroo
13 14 NaN Nakano 中野区 332902 21350 15.59 Nakano
14 15 NaN Suginami 杉並区 570483 16750 34.06 Kōenji, Asagaya, Ogikubo
15 16 NaN Toshima 豊島区 294673 22650 13.01 Ikebukuro, Komagome, Senkawa, Sugamo
16 17 NaN Kita 北区 345063 16740 20.61 Akabane, Ōji, Tabata
17 18 NaN Arakawa 荒川区 213648 21030 10.16 Arakawa, Machiya, Nippori, Minamisenju
18 19 NaN Itabashi 板橋区 569225 17670 32.22 Itabashi, Takashimadaira
19 20 NaN Nerima 練馬区 726748 15120 48.08 Nerima, Ōizumi, Hikarigaoka
20 21 NaN Adachi 足立区 674067 12660 53.25 Ayase, Kitasenju, Takenotsuka
21 22 NaN Katsushika 葛飾区 447140 12850 34.80 Tateishi, Aoto, Kameari, Shibamata
22 23 NaN Edogawa 江戸川区 685899 13750 49.90 Kasai, Koiwa
23 Overall Overall Overall Overall 9375104 15146 619.00 NaN
In [19]:
#Drop unused columns and the last row
tokyo_wards_data = tokyo_wards.drop(['No.', 'Flag'], axis=1)
tokyo_wards_data.drop([23], inplace=True) 
In [18]:
tokyo_wards_data
Out[18]:
Ward Kanji Population Density Area Major districts
0 Chiyoda 千代田区 59441 5100 11.66 Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,...
1 Chūō 中央区 147620 14460 10.21 Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb...
2 Minato 港区 248071 12180 20.37 Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong...
3 Shinjuku 新宿区 339211 18620 18.22 Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich...
4 Bunkyō 文京区 223389 19790 11.29 Hongō, Yayoi, Hakusan
5 Taitō 台東区 200486 19830 10.11 Ueno, Asakusa
6 Sumida 墨田区 260358 18910 13.77 Kinshichō, Morishita, Ryōgoku
7 Kōtō 江東区 502579 12510 40.16 Kiba, Ariake, Kameido, Tōyōchō, Monzennakachō,...
8 Shinagawa 品川区 392492 17180 22.84 Shinagawa, Gotanda, Ōsaki, Hatanodai, Ōimachi,...
9 Meguro 目黒区 280283 19110 14.67 Meguro, Nakameguro, Jiyugaoka, Komaba, Aobadai
10 Ōta 大田区 722608 11910 60.66 Ōmori, Kamata, Haneda, Den-en-chōfu
11 Setagaya 世田谷区 910868 15690 58.05 Shimokitazawa, Kinuta, Karasuyama, Tamagawa
12 Shibuya 渋谷区 227850 15080 15.11 Shibuya, Ebisu, Harajuku, Daikanyama, Hiroo
13 Nakano 中野区 332902 21350 15.59 Nakano
14 Suginami 杉並区 570483 16750 34.06 Kōenji, Asagaya, Ogikubo
15 Toshima 豊島区 294673 22650 13.01 Ikebukuro, Komagome, Senkawa, Sugamo
16 Kita 北区 345063 16740 20.61 Akabane, Ōji, Tabata
17 Arakawa 荒川区 213648 21030 10.16 Arakawa, Machiya, Nippori, Minamisenju
18 Itabashi 板橋区 569225 17670 32.22 Itabashi, Takashimadaira
19 Nerima 練馬区 726748 15120 48.08 Nerima, Ōizumi, Hikarigaoka
20 Adachi 足立区 674067 12660 53.25 Ayase, Kitasenju, Takenotsuka
21 Katsushika 葛飾区 447140 12850 34.80 Tateishi, Aoto, Kameari, Shibamata
22 Edogawa 江戸川区 685899 13750 49.90 Kasai, Koiwa

- Add Geospatial Data

Get the coordinates of 23 special wards using GeoCoder

In [25]:
from geopy.geocoders import Nominatim 
geolocator = Nominatim(user_agent="Tokyo_explorer")

tokyo_wards_data['Major_Dist_Coord']= tokyo_wards_data['Ward'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
tokyo_wards_data[['Latitude', 'Longitude']] = tokyo_wards_data['Major_Dist_Coord'].apply(pd.Series)

tokyo_wards_data.drop(['Major_Dist_Coord'], axis=1, inplace=True)
tokyo_wards_data
Out[25]:
Ward Kanji Population Density Area Major districts Latitude Longitude
0 Chiyoda 千代田区 59441 5100 11.66 Nagatachō, Kasumigaseki, Ōtemachi, Marunouchi,... 35.693810 139.753216
1 Chūō 中央区 147620 14460 10.21 Nihonbashi, Kayabachō, Ginza, Tsukiji, Hatchōb... 35.666255 139.775565
2 Minato 港区 248071 12180 20.37 Odaiba, Shinbashi, Hamamatsuchō, Mita, Roppong... 35.643227 139.740055
3 Shinjuku 新宿区 339211 18620 18.22 Shinjuku, Takadanobaba, Ōkubo, Kagurazaka, Ich... 35.693763 139.703632
4 Bunkyō 文京区 223389 19790 11.29 Hongō, Yayoi, Hakusan 35.718810 139.744732
5 Taitō 台東区 200486 19830 10.11 Ueno, Asakusa 35.717450 139.790859
6 Sumida 墨田区 260358 18910 13.77 Kinshichō, Morishita, Ryōgoku 35.700429 139.805017
7 Kōtō 江東区 502579 12510 40.16 Kiba, Ariake, Kameido, Tōyōchō, Monzennakachō,... 35.649154 139.812790
8 Shinagawa 品川区 392492 17180 22.84 Shinagawa, Gotanda, Ōsaki, Hatanodai, Ōimachi,... 35.599252 139.738910
9 Meguro 目黒区 280283 19110 14.67 Meguro, Nakameguro, Jiyugaoka, Komaba, Aobadai 35.621250 139.688014
10 Ōta 大田区 722608 11910 60.66 Ōmori, Kamata, Haneda, Den-en-chōfu 35.561206 139.715843
11 Setagaya 世田谷区 910868 15690 58.05 Shimokitazawa, Kinuta, Karasuyama, Tamagawa 35.646096 139.656270
12 Shibuya 渋谷区 227850 15080 15.11 Shibuya, Ebisu, Harajuku, Daikanyama, Hiroo 35.664596 139.698711
13 Nakano 中野区 332902 21350 15.59 Nakano 35.718123 139.664468
14 Suginami 杉並区 570483 16750 34.06 Kōenji, Asagaya, Ogikubo 35.699493 139.636288
15 Toshima 豊島区 294673 22650 13.01 Ikebukuro, Komagome, Senkawa, Sugamo 35.736156 139.714222
16 Kita 北区 345063 16740 20.61 Akabane, Ōji, Tabata -0.220164 -78.512327
17 Arakawa 荒川区 213648 21030 10.16 Arakawa, Machiya, Nippori, Minamisenju 35.737529 139.781310
18 Itabashi 板橋区 569225 17670 32.22 Itabashi, Takashimadaira 35.774143 139.681209
19 Nerima 練馬区 726748 15120 48.08 Nerima, Ōizumi, Hikarigaoka 35.748360 139.638735
20 Adachi 足立区 674067 12660 53.25 Ayase, Kitasenju, Takenotsuka 35.783703 139.795319
21 Katsushika 葛飾区 447140 12850 34.80 Tateishi, Aoto, Kameari, Shibamata 35.751733 139.863816
22 Edogawa 江戸川区 685899 13750 49.90 Kasai, Koiwa 35.678278 139.871091

Split the "Major Districts" column and rename the column to "District"

In [23]:
tokyo_wards_data_dist = tokyo_wards_data.drop('Major districts', axis=1).join(tokyo_wards_data['Major districts'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('District'))

tokyo_wards_data_dist.head()
Out[23]:
Ward Kanji Population Density Area Latitude Longitude District
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Nagatachō
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Kasumigaseki
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Ōtemachi
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Marunouchi
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Akihabara

Get the coordinates of districts

In [24]:
from geopy.geocoders import Nominatim 
geolocator = Nominatim(user_agent="Tokyo_explorer")

tokyo_wards_data_dist['Major_Dist_Coord_']= tokyo_wards_data_dist['District'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
tokyo_wards_data_dist[['District Latitude', 'District Longitude']] = tokyo_wards_data_dist['Major_Dist_Coord_'].apply(pd.Series)

tokyo_wards_data_dist.drop(['Major_Dist_Coord_'], axis=1, inplace=True)
tokyo_wards_data_dist
Out[24]:
Ward Kanji Population Density Area Latitude Longitude District District Latitude District Longitude
0 Chiyoda 千代田区 59441 5100 11.66 35.693810 139.753216 Nagatachō 35.675618 139.743469
0 Chiyoda 千代田区 59441 5100 11.66 35.693810 139.753216 Kasumigaseki 35.674054 139.750972
0 Chiyoda 千代田区 59441 5100 11.66 35.693810 139.753216 Ōtemachi 35.686788 139.766224
0 Chiyoda 千代田区 59441 5100 11.66 35.693810 139.753216 Marunouchi 35.680656 139.765222
0 Chiyoda 千代田区 59441 5100 11.66 35.693810 139.753216 Akihabara 35.701893 139.774368
... ... ... ... ... ... ... ... ... ... ...
21 Katsushika 葛飾区 447140 12850 34.80 35.751733 139.863816 Aoto 35.745574 139.856054
21 Katsushika 葛飾区 447140 12850 34.80 35.751733 139.863816 Kameari 35.766665 139.847801
21 Katsushika 葛飾区 447140 12850 34.80 35.751733 139.863816 Shibamata 35.756430 139.875181
22 Edogawa 江戸川区 685899 13750 49.90 35.678278 139.871091 Kasai -5.349800 21.424098
22 Edogawa 江戸川区 685899 13750 49.90 35.678278 139.871091 Koiwa 35.733184 139.881900

106 rows × 10 columns

In [26]:
#save
# Export dataframe to csv, If later we want to start with a csv copy 
tokyo_wards_data_dist.to_csv('tokyo_wards_data_dist.csv',index=False)
In [3]:
#read
tokyo_wards_data_dist = pd.read_csv('tokyo_wards_data_dist.csv')
tokyo_wards_data_dist.head(3)
Out[3]:
Ward Kanji Population Density Area Latitude Longitude District District Latitude District Longitude
0 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Nagatachō 35.675618 139.743469
1 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Kasumigaseki 35.674054 139.750972
2 Chiyoda 千代田区 59441 5100 11.66 35.69381 139.753216 Ōtemachi 35.686788 139.766224

- Get top 4 attractions in Tokyo

Create the data file to simplify with the first four places from Google, for simplicity:

In [69]:
data = {'Attraction':  ['Sensō-ji Temple', 'Tokyo Skytree', 'Tokyo Tower', 'Meiji Shrine'],
        'Address': ['2-3-1 Asakusa, Taitō-ku, Tokyo', '1 Chome-1-2 Oshiage, Sumida City, Tokyo','4 Chome-2-8 Shibakoen, Minato City, Tokyo', '1-1 Yoyogikamizonocho, Shibuya City, Tokyo'],
        'Ward': ['Taitō', 'Sumida','Minato', 'Shibuya'],
        'District': ['Asakusa', 'Oshiage','Shibakoen','Shibuya' ]}


# df = pd.DataFrame (data, columns = ['Attraction','Address','District','Latitude of Attraction', 'Longitude of Attraction','Ward'])
df = pd.DataFrame (data, columns = ['Attraction','Address','District','Ward'])
In [70]:
#Get lat and lng of the each attraction
from geopy.geocoders import Nominatim 
geolocator = Nominatim(user_agent="Tokyo_explorer")

df['Major_Dist_Coord']= df['Attraction'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df[['Attraction Latitude', 'Attraction Longitude']] = df['Major_Dist_Coord'].apply(pd.Series)

df.drop(['Major_Dist_Coord'], axis=1, inplace=True)
df
Out[70]:
Attraction Address District Ward Attraction Latitude Attraction Longitude
0 Sensō-ji Temple 2-3-1 Asakusa, Taitō-ku, Tokyo Asakusa Taitō 35.713402 139.795519
1 Tokyo Skytree 1 Chome-1-2 Oshiage, Sumida City, Tokyo Oshiage Sumida 35.710054 139.810714
2 Tokyo Tower 4 Chome-2-8 Shibakoen, Minato City, Tokyo Shibakoen Minato 35.658586 139.745440
3 Meiji Shrine 1-1 Yoyogikamizonocho, Shibuya City, Tokyo Shibuya Shibuya 35.674842 139.699627
In [71]:
#save
# Export dataframe to csv, If later we want to start with a csv copy for task 2
df.to_csv('tokyo_attractions.csv',index=False)
In [4]:
df = pd.read_csv('tokyo_attractions.csv')
df
Out[4]:
Attraction Address District Ward Attraction Latitude Attraction Longitude
0 Sensō-ji Temple 2-3-1 Asakusa, Taitō-ku, Tokyo Asakusa Taitō 35.713402 139.795519
1 Tokyo Skytree 1 Chome-1-2 Oshiage, Sumida City, Tokyo Oshiage Sumida 35.710054 139.810714
2 Tokyo Tower 4 Chome-2-8 Shibakoen, Minato City, Tokyo Shibakoen Minato 35.658586 139.745440
3 Meiji Shrine 1-1 Yoyogikamizonocho, Shibuya City, Tokyo Shibuya Shibuya 35.674842 139.699627

Let's visualize Tokyo

Firstly, get the geographical coordinates of Tokyo...

In [5]:
address = 'Tokyo'

geolocator = Nominatim(user_agent="Tokyo_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tokyo are {}, {}.'.format(latitude, longitude))
The geograpical coordinate of Tokyo are 35.6828387, 139.7594549.
In [6]:
#Check data types:
df.dtypes
Out[6]:
Attraction               object
Address                  object
District                 object
Ward                     object
Attraction Latitude     float64
Attraction Longitude    float64
dtype: object
In [7]:
tokyo_wards_data_dist.dtypes
Out[7]:
Ward                   object
Kanji                  object
Population              int64
Density                 int64
Area                  float64
Latitude              float64
Longitude             float64
District               object
District Latitude     float64
District Longitude    float64
dtype: object

Download countries geojson file

In [2]:
!wget --quiet https://raw.githubusercontent.com/dataofjapan/land/master/tokyo.geojson
    
print('GeoJSON file downloaded!')
GeoJSON file downloaded!
In [6]:
bcn_geo = r'tokyo.geojson' # geojson file

The ward names in the geojson file end with "Ku". While in Wikipedia table does not. So, to merge geojson df (bcn_geo) with wikipedia table, I will use "Kanji" and "ward_ja" columns for merging, instead.

In [25]:
from shapely.geometry import shape
import json

gdf = gpd.read_file(bcn_geo)
gdf = gdf.merge(tokyo_wards_data_dist, left_on="ward_ja", right_on="Kanji")
gdf = gdf.drop(columns=['ward_ja','ward_en','Kanji','area_ja','area_en','code','Population','Area','Latitude','Longitude'])

gdf
Out[25]:
geometry Ward Density District District Latitude District Longitude
0 POLYGON ((139.82105 35.81508, 139.82168 35.814... Adachi 12660 Ayase 35.446047 139.430823
1 POLYGON ((139.82105 35.81508, 139.82168 35.814... Adachi 12660 Kitasenju 35.754036 139.804177
2 POLYGON ((139.82105 35.81508, 139.82168 35.814... Adachi 12660 Takenotsuka 35.794532 139.790712
3 POLYGON ((139.76093 35.73221, 139.76100 35.732... Bunkyō 19790 Hongō 35.175376 137.013476
4 POLYGON ((139.76093 35.73221, 139.76100 35.732... Bunkyō 19790 Yayoi 44.079308 143.552653
... ... ... ... ... ... ...
101 MULTIPOLYGON (((139.69539 35.60749, 139.69563 ... Ōta 11910 Den-en-chōfu 35.660036 139.554815
102 POLYGON ((139.81449 35.73880, 139.81466 35.737... Arakawa 21030 Arakawa 35.737529 139.781310
103 POLYGON ((139.81449 35.73880, 139.81466 35.737... Arakawa 21030 Machiya 35.742314 139.781413
104 POLYGON ((139.81449 35.73880, 139.81466 35.737... Arakawa 21030 Nippori 35.728380 139.770982
105 POLYGON ((139.81449 35.73880, 139.81466 35.737... Arakawa 21030 Minamisenju 35.736661 139.796677

106 rows × 6 columns

In [9]:
gdf.to_csv('gdf.csv',index=False)
In [7]:
gdf = pd.read_csv('gdf.csv')

Visualize the population density of Tokyo's special wards

In [147]:
# Initialize the figure
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1, figsize=(16, 12))

# Set up the color sheme:
import mapclassify as mc
scheme = mc.Quantiles(gdf['Density'], k=8)

# Map
gplt.choropleth(gdf, 
    hue="Density", 
    linewidth=.1,
    scheme=scheme, cmap='Dark2',
    legend=True,
    edgecolor='black',
    ax=ax
);

ax.set_title('Population Density (/km^2) in Each Ward of Tokyo', fontsize=13);

Now let's create a map of the major districts in 23 special wards using latitude and longitude values to check if they are correct locations...

In [27]:
map_jp = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(gdf['District Latitude'], gdf['District Longitude'], gdf['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jp)  
    
map_jp
Out[27]:
Make this Notebook Trusted to load map: File -> Trust Notebook

As seen, there are several misplaced geolocations that need to get fixed.

Before fixing the misplaced locations, we first join "df" with "tokyo_wards_data_dist" to select only districts in the wards where the top 4 attractions located

In [28]:
df_tokyo = gdf.merge(df, on="Ward")
df_tokyo = df_tokyo.drop(columns=['Address','geometry','District_y', 'Attraction', 'Attraction Latitude','Attraction Longitude' ])
df_tokyo = df_tokyo.rename(columns={"District_x": "District"})
df_tokyo.reset_index()
df_tokyo
Out[28]:
Ward Density District District Latitude District Longitude
0 Taitō 19830 Ueno 35.713376 139.776656
1 Taitō 19830 Asakusa 35.717597 139.797563
2 Shibuya 15080 Shibuya 35.664596 139.698711
3 Shibuya 15080 Ebisu 35.646438 139.710210
4 Shibuya 15080 Harajuku 35.668705 139.705336
5 Shibuya 15080 Daikanyama 35.648157 139.703293
6 Shibuya 15080 Hiroo 42.285532 143.311616
7 Sumida 18910 Kinshichō 35.696312 139.815043
8 Sumida 18910 Morishita 35.687998 139.797044
9 Sumida 18910 Ryōgoku 35.696854 139.797428
10 Minato 12180 Odaiba 35.619050 139.779364
11 Minato 12180 Shinbashi 35.665106 139.756116
12 Minato 12180 Hamamatsuchō 35.655111 139.757062
13 Minato 12180 Mita 3.500009 -73.000009
14 Minato 12180 Roppongi 35.662457 139.733498
15 Minato 12180 Toranomon 35.670187 139.750056
16 Minato 12180 Aoyama 37.898632 139.001079
17 Minato 12180 Azabu 35.656402 139.733970
18 Minato 12180 Akasaka 35.671679 139.735622
In [11]:
map_jp_err = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_tokyo['District Latitude'], df_tokyo['District Longitude'], df_tokyo['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=False).add_to(map_jp_err)  
    
map_jp_err
Out[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [224]:
map_jp_err.save('visualize Err Loc Map.html')

According to the map above, there are coordinate values of several districts that are off. This error can easily be found since there are chances that places around the world share the same name.

Let's fix...

Districts: Aoyama, Hiroo, Mita

According to the https://latitude.to/articles-by-country/jp/japan/34507/aoyama-minato-tokyo

The Aoyama's correct location: 35.6720 139.7230

The Hiroo's correct location: 35.6505 139.7173

The Mita's correct location: 35.6472 139.7409

In [30]:
#Aoyama
df_tokyo.loc[df_tokyo['District'] == 'Aoyama']
Out[30]:
Ward Density District District Latitude District Longitude
In [31]:
#The district isn't found using above query...

#The district is found by having space
df_tokyo.loc[df_tokyo['District'] == ' Aoyama']
Out[31]:
Ward Density District District Latitude District Longitude
16 Minato 12180 Aoyama 37.898632 139.001079
In [32]:
df_tokyo.iloc[16]
Out[32]:
Ward                   Minato
Density                 12180
District               Aoyama
District Latitude     37.8986
District Longitude    139.001
Name: 16, dtype: object
In [33]:
#Replace with the correct lat and lng values
df_tokyo.at[16,['District Latitude', 'District Longitude']]= [35.6720,139.7230]
In [34]:
#check
df_tokyo.iloc[16]
Out[34]:
Ward                   Minato
Density                 12180
District               Aoyama
District Latitude      35.672
District Longitude    139.723
Name: 16, dtype: object
In [35]:
#Hiroo
df_tokyo.iloc[6]
Out[35]:
Ward                  Shibuya
Density                 15080
District                Hiroo
District Latitude     42.2855
District Longitude    143.312
Name: 6, dtype: object
In [36]:
#Replace with the correct lat and lng values
df_tokyo.at[6,['District Latitude', 'District Longitude']]= [35.6505,139.7173]
In [37]:
#check
df_tokyo.iloc[6]
Out[37]:
Ward                  Shibuya
Density                 15080
District                Hiroo
District Latitude     35.6505
District Longitude    139.717
Name: 6, dtype: object
In [38]:
#Mita
df_tokyo.loc[df_tokyo['District'] == ' Mita']
Out[38]:
Ward Density District District Latitude District Longitude
13 Minato 12180 Mita 3.500009 -73.000009
In [39]:
df_tokyo.at[13,['District Latitude', 'District Longitude']]= [35.6472,139.7409]
df_tokyo.iloc[13]
Out[39]:
Ward                   Minato
Density                 12180
District                 Mita
District Latitude     35.6472
District Longitude    139.741
Name: 13, dtype: object
In [40]:
gdf.astype({'District Latitude': 'int32', 'District Longitude': 'int32'}).dtypes
Out[40]:
geometry              object
Ward                  object
Density                int64
District              object
District Latitude      int32
District Longitude     int32
dtype: object
In [41]:
#save
# Export dataframe to csv, If later we want to start with a csv copy for task 2
df_tokyo.to_csv('df_tokyo.csv',index=False)
In [8]:
df_tokyo = pd.read_csv('df_tokyo.csv')
df_tokyo
Out[8]:
Ward Density District District Latitude District Longitude
0 Taitō 19830 Ueno 35.713376 139.776656
1 Taitō 19830 Asakusa 35.717597 139.797563
2 Shibuya 15080 Shibuya 35.664596 139.698711
3 Shibuya 15080 Ebisu 35.646438 139.710210
4 Shibuya 15080 Harajuku 35.668705 139.705336
5 Shibuya 15080 Daikanyama 35.648157 139.703293
6 Shibuya 15080 Hiroo 35.650500 139.717300
7 Sumida 18910 Kinshichō 35.696312 139.815043
8 Sumida 18910 Morishita 35.687998 139.797044
9 Sumida 18910 Ryōgoku 35.696854 139.797428
10 Minato 12180 Odaiba 35.619050 139.779364
11 Minato 12180 Shinbashi 35.665106 139.756116
12 Minato 12180 Hamamatsuchō 35.655111 139.757062
13 Minato 12180 Mita 35.647200 139.740900
14 Minato 12180 Roppongi 35.662457 139.733498
15 Minato 12180 Toranomon 35.670187 139.750056
16 Minato 12180 Aoyama 35.672000 139.723000
17 Minato 12180 Azabu 35.656402 139.733970
18 Minato 12180 Akasaka 35.671679 139.735622
In [43]:
#Recheck
# create map of districts in Tokyo using latitude and longitude values, to check if they are correct 
map_jp_err_recheck = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_tokyo['District Latitude'], df_tokyo['District Longitude'], df_tokyo['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='#b300fa',
        fill=True,
        fill_color='#b300fa',
        fill_opacity=0.7,
        parse_html=False).add_to(map_jp_err_recheck)  
    
map_jp_err_recheck
Out[43]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [111]:
map_jp_err_recheck.save('visualize Recheck Map.html')

- Foursquare APIs

Now I have all the wards and districts of Tokyo, along with top 4 attractions. Then, I will use the FourSquare APIs to fetch all the venues surrounding those locations to explore the districts and wards of the attractions.

In [9]:
CLIENT_ID = 'R3X02K1RQ43LXURRIDKKOE05Z1OOU531DTWXN2YYDGEKT5QB' # your Foursquare ID
CLIENT_SECRET = 'URUDLZJQNML4HHCDHF3A2GWAS2DR4QJAFXN2C12FNIJTZCFF' # your Foursquare Secret
ACCESS_TOKEN= '42UCQQVMZYZTOJUIHEHQYTKI1NUYE44RQ15A45XH3ZST31AS'
VERSION = '20180604'
LIMIT = 100 # limit of number of venues returned by Foursquare API
# radius = 1500 # define radius
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
Your credentails:
CLIENT_ID: R3X02K1RQ43LXURRIDKKOE05Z1OOU531DTWXN2YYDGEKT5QB
CLIENT_SECRET:URUDLZJQNML4HHCDHF3A2GWAS2DR4QJAFXN2C12FNIJTZCFF
In [11]:
#Create a function to get the top 100 venues in each district
#Venue Recommendations
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],
#             v['venue']['location']['address'],
            v['venue']['location']['distance'],
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
#                   'Address',
                  'Venue distance',
                  'Venue Category',
                  'Venue ID']
    
    return(nearby_venues)


#-------------------------------------------------------------------------------------------------
#Venue Details

def get_venue_details(venue_id):
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venue/{}?&client_id={}&client_secret={}&v={}'.format(
            venue_id, 
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
    # get all the data
    results = requests.get(url).json()
    print(results)
    venue_data=results['response']['venue']
    venue_details=[]
    try:
        venue_id=venue_data['id']
        venue_name=venue_data['name']
        venue_likes=venue_data['likes']['count']
        venue_rating=venue_data['rating']
        venue_tips=venue_data['tips']['count']
        venue_details.append([venue_id,venue_name,venue_likes,venue_rating,venue_tips])
    except KeyError:
        pass
    column_names=['ID','Name','Likes','Rating','Tips']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df

Utilizing the Foursquare API (Venue Recommendations) to explore each district's top 100 venues within a radius of 2500 m. and create a new dataframe, "Tokyo_venues".

In [14]:
Tokyo_venues = getNearbyVenues(names=df_tokyo['District'],
                                   latitudes=df_tokyo['District Latitude'],
                                   longitudes=df_tokyo['District Longitude'],
                                   radius=2500
                                  )
Ueno
 Asakusa
Shibuya
 Ebisu
 Harajuku
 Daikanyama
 Hiroo
Kinshichō
 Morishita
 Ryōgoku
Odaiba
 Shinbashi
 Hamamatsuchō
 Mita
 Roppongi
 Toranomon
 Aoyama
 Azabu
 Akasaka
In [15]:
Tokyo_venues_csv = Tokyo_venues.merge(gdf[['District','Ward']], on='District')
Tokyo_venues_csv
Out[15]:
District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID Ward
0 Ueno 35.713376 139.776656 National Museum of Nature and Science (国立科学博物館) 35.715898 139.776683 280 Science Museum 4b612a56f964a520ae0b2ae3 Taitō
1 Ueno 35.713376 139.776656 Ueno Park (上野恩賜公園) 35.714675 139.773487 320 Park 4b0587a1f964a5209b9d22e3 Taitō
2 Ueno 35.713376 139.776656 Motsuyaki Daitoryo (もつ焼き大統領 支店) 35.710369 139.775326 355 Sake Bar 4c415399a5c5ef3bc736b06f Taitō
3 Ueno 35.713376 139.776656 Renkon (れんこん) 35.709834 139.774245 450 Sake Bar 4b5d73b2f964a520615d29e3 Taitō
4 Ueno 35.713376 139.776656 National Museum of Western Art (国立西洋美術館) 35.715362 139.775880 231 Art Museum 4b556c34f964a520ece327e3 Taitō
... ... ... ... ... ... ... ... ... ... ...
1895 Akasaka 35.671679 139.735622 Prince Chichibu Memorial Rugby Stadium (秩父宮ラグビー場) 35.672604 139.718173 1581 Rugby Stadium 4b599fe4f964a520808f28e3 Minato
1896 Akasaka 35.671679 139.735622 Crisp Salad Works 35.660261 139.729985 1369 Salad Place 56bd6592498ec2feaf04f7b9 Minato
1897 Akasaka 35.671679 139.735622 Taiyaki Wakaba (たいやき わかば) 35.685970 139.726919 1774 Wagashi Place 4b67c6bcf964a5206d5d2be3 Minato
1898 Akasaka 35.671679 139.735622 Grand Club Lounge (グランド クラブ ラウンジ) 35.659613 139.728134 1504 Hotel Bar 4bf1def23fa220a1efd41820 Minato
1899 Akasaka 35.671679 139.735622 Fureika (中国飯店 富麗華) 35.656269 139.738481 1734 Chinese Restaurant 4b5ac0bdf964a52050d328e3 Minato

1900 rows × 10 columns

In [57]:
Tokyo_venues_csv.to_csv('Tokyo_venues_csv.csv',index=False)
In [58]:
Tokyo_venues_csv = pd.read_csv('Tokyo_venues_csv.csv')

Although the foursquare API helped to find a lot of venues, it could extract duplicates venues if 2 centroids are too close together. So, let's check if there is any duplicated venue.

Count how many duplicates there are...

In [17]:
Tokyo_venues_csv.duplicated('Venue ID').value_counts()
Out[17]:
True     973
False    927
dtype: int64

Drop duplicates by keeping the first one only

In [18]:
Tokyo_venues_csv = Tokyo_venues_csv.sort_values(['Venue ID','Venue distance'] , ascending=[False, True])
Tokyo_venues_csv = Tokyo_venues_csv.drop_duplicates(subset='Venue ID', keep='first')
Tokyo_venues_csv
Out[18]:
District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID Ward
1062 Odaiba 35.619050 139.779364 MUJI (無印良品) 35.638366 139.791723 2423 Clothing Store 5fc71d38844d440810c59914 Minato
1060 Odaiba 35.619050 139.779364 Burger King (バーガーキング) 35.638170 139.791205 2382 Fast Food Restaurant 5f781e3974a1ae52a3666e4b Minato
1065 Odaiba 35.619050 139.779364 Daiso (ダイソー) 35.638277 139.791455 2403 Discount Store 5f097ab824104901a8d30ff8 Minato
1076 Odaiba 35.619050 139.779364 イオンスタイル 35.638251 139.793349 2483 Supermarket 5ebe3f4c00cf4e00089a9b48 Minato
784 Kinshichō 35.696312 139.815043 Daiso (ダイソー) 35.689619 139.810797 838 Discount Store 5de0673d1a7b220008d26014 Sumida
... ... ... ... ... ... ... ... ... ... ...
1417 Roppongi 35.662457 139.733498 Maduro (ジャズラウンジ マデュロ) 35.659933 139.728515 531 Jazz Club 4b05879ef964a520dd9c22e3 Minato
1419 Roppongi 35.662457 139.733498 Grand Hyatt Tokyo (グランドハイアット東京) 35.659759 139.728354 553 Hotel 4b05879cf964a5205b9c22e3 Minato
1530 Toranomon 35.670187 139.750056 The Peninsula Tokyo (ザ・ペニンシュラ東京) 35.674724 139.760553 1075 Hotel 4b05879cf964a520589c22e3 Minato
1411 Roppongi 35.662457 139.733498 The Ritz-Carlton Tokyo (ザ・リッツ・カールトン東京) 35.666327 139.731358 472 Hotel 4b05879cf964a520579c22e3 Minato
1111 Shinbashi 35.665106 139.756116 Conrad Tokyo (コンラッド東京) 35.662973 139.760841 488 Hotel 4b05879cf964a5204e9c22e3 Minato

927 rows × 10 columns

In [19]:
#Recheck
Tokyo_venues_csv.duplicated('Venue ID').value_counts()
Out[19]:
False    927
dtype: int64
In [20]:
#save the change
Tokyo_venues_csv.to_csv('Tokyo_venues_csv.csv',index=False)
In [21]:
Tokyo_venues_csv = pd.read_csv('Tokyo_venues_csv.csv')

Let's see how many districts in these 4 wards

In [22]:
tot_distr = Tokyo_venues_csv['District'].unique()
print (tot_distr)
num_distc = tot_distr.shape
num_distc
['Odaiba' 'Kinshichō' 'Shibuya' ' Shinbashi' ' Morishita' ' Aoyama'
 ' Asakusa' 'Ueno' ' Daikanyama' ' Roppongi' ' Ryōgoku' ' Mita' ' Akasaka'
 ' Harajuku' ' Azabu' ' Toranomon' ' Ebisu' ' Hiroo' ' Hamamatsuchō']
Out[22]:
(19,)
In [23]:
#Find how many unique venues in the districts
tot_unique = Tokyo_venues_csv['Venue Category'].unique()
tot_unique.shape
Out[23]:
(181,)

So, there are 19 districts in 4 wards with 181 unique venues.

Let's visualize the venues in the districts...

In [24]:
# create map
map_num_distc = folium.Map(location=[latitude, longitude], zoom_start=11)





# add markers to map
for lat, lng, poi, label in zip(Tokyo_venues_csv['Venue Latitude'], Tokyo_venues_csv['Venue Longitude'], Tokyo_venues_csv['Ward'], Tokyo_venues_csv['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=True).add_to(map_num_distc)  
    
map_num_distc
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [25]:
map_num_distc.save('map_num_distc.html')

Let's assign numbers to the wards for color clustering...

In [26]:
Tokyo_venues_csv['No.Ward'] = Tokyo_venues_csv['Ward'].astype('category').cat.codes


Tokyo_venues_csv.tail(20)
Out[26]:
District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID Ward No.Ward
907 Shinbashi 35.665106 139.756116 CHANEL 35.673032 139.766544 1291 Boutique 4b0587baf964a52060a122e3 Minato 0
908 Roppongi 35.662457 139.733498 LIVING MOTIF (リビング・モティーフ) 35.661665 139.736610 294 Furniture / Home Store 4b0587b8f964a52029a122e3 Minato 0
909 Asakusa 35.717597 139.797563 Asakusa Engei Hall (浅草演芸ホール) 35.713473 139.793112 610 Comedy Club 4b0587b3f964a5206ba022e3 Taitō 3
910 Hiroo 35.650500 139.717300 Blue Note Tokyo 35.661158 139.716134 1191 Jazz Club 4b0587b2f964a52042a022e3 Shibuya 1
911 Akasaka 35.671679 139.735622 Suntory Hall (サントリーホール) 35.667274 139.740544 662 Concert Hall 4b0587b2f964a5203fa022e3 Minato 0
912 Aoyama 35.672000 139.723000 Narisawa 35.671516 139.722117 96 French Restaurant 4b0587b1f964a52034a022e3 Minato 0
913 Morishita 35.687998 139.797044 Tapas Molecular Bar (タパス モラキュラーバー) 35.687147 139.772867 2188 Tapas Restaurant 4b0587b1f964a5202ca022e3 Sumida 2
914 Shibuya 35.664596 139.698711 Chez Matsuo (シェ松尾 松濤レストラン) 35.660804 139.692413 708 French Restaurant 4b0587acf964a520639f22e3 Shibuya 1
915 Roppongi 35.662457 139.733498 Mori Art Museum (森美術館) 35.660367 139.729222 451 Art Museum 4b0587a8f964a520ca9e22e3 Minato 0
916 Ueno 35.713376 139.776656 Ueno Park (上野恩賜公園) 35.714675 139.773487 320 Park 4b0587a1f964a5209b9d22e3 Taitō 3
917 Asakusa 35.717597 139.797563 Senso-ji Temple (浅草寺) 35.714662 139.796761 334 Buddhist Temple 4b0587a1f964a5207f9d22e3 Taitō 3
918 Shibuya 35.664596 139.698711 JZ Brat 35.656138 139.699217 942 Jazz Club 4b05879ff964a520379d22e3 Shibuya 1
919 Shinbashi 35.665106 139.756116 China Blue (チャイナブルー) 35.663054 139.761232 515 Chinese Restaurant 4b05879ff964a520229d22e3 Minato 0
920 Toranomon 35.670187 139.750056 The Peninsula Boutique & Café (ザ・ペニンシュラ ブティック&... 35.674771 139.760558 1078 Café 4b05879ff964a5201c9d22e3 Minato 0
921 Ueno 35.713376 139.776656 Cafe Mai:lish 35.702328 139.769425 1392 Theme Restaurant 4b05879ef964a520f19c22e3 Taitō 3
922 Roppongi 35.662457 139.733498 Maduro (ジャズラウンジ マデュロ) 35.659933 139.728515 531 Jazz Club 4b05879ef964a520dd9c22e3 Minato 0
923 Roppongi 35.662457 139.733498 Grand Hyatt Tokyo (グランドハイアット東京) 35.659759 139.728354 553 Hotel 4b05879cf964a5205b9c22e3 Minato 0
924 Toranomon 35.670187 139.750056 The Peninsula Tokyo (ザ・ペニンシュラ東京) 35.674724 139.760553 1075 Hotel 4b05879cf964a520589c22e3 Minato 0
925 Roppongi 35.662457 139.733498 The Ritz-Carlton Tokyo (ザ・リッツ・カールトン東京) 35.666327 139.731358 472 Hotel 4b05879cf964a520579c22e3 Minato 0
926 Shinbashi 35.665106 139.756116 Conrad Tokyo (コンラッド東京) 35.662973 139.760841 488 Hotel 4b05879cf964a5204e9c22e3 Minato 0
In [27]:
Tokyo_venues_csv.to_csv('Tokyo_venues_csv.csv',index=False)
In [28]:
Tokyo_venues_csv = pd.read_csv('Tokyo_venues_csv.csv')
In [29]:
#Clustered venues by ward
map_num_distc2 = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(4)
ys = [i + x + (i*x)**2 for i in range(4)]
colors_array = cm.spring(np.linspace(0, 1, len(ys)))
spring = [colors.rgb2hex(i) for i in colors_array]



# add markers to the map
markers_colors = []
for lat, lng, poi, cluster in zip(Tokyo_venues_csv['Venue Latitude'], Tokyo_venues_csv['Venue Longitude'], Tokyo_venues_csv['Venue'], Tokyo_venues_csv['No.Ward']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=spring[cluster-1],
        fill=True,
        fill_color=spring[cluster-1],
        fill_opacity=0.5).add_to(map_num_distc2)

    #Display the attrictions
    # add markers to map
for lat, lng, label in zip(df['Attraction Latitude'], df['Attraction Longitude'], df['Attraction']):
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
    [lat, lng],
    popup=label,
    icon = folium.Icon(color='cadetblue',icon = 'heart',prefix='glyphicon')
    ).add_to(map_num_distc2) 
       
map_num_distc2
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [31]:
map_num_distc2.save('map_num_distc_colored.html')

Get ratings and likes to confirm the popularity of the attractions

First, let's extract venue IDs

- Sensoji

In [32]:
s = Tokyo_venues_csv[Tokyo_venues_csv['Venue'].str.contains("Senso-ji")]
s
Out[32]:
District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID Ward No.Ward
917 Asakusa 35.717597 139.797563 Senso-ji Temple (浅草寺) 35.714662 139.796761 334 Buddhist Temple 4b0587a1f964a5207f9d22e3 Taitō 3
In [33]:
Sensoji = s.iloc[0][8]
Sensoji
Out[33]:
'4b0587a1f964a5207f9d22e3'

- Tokyo Skytree

In [38]:
t = Tokyo_venues_csv[Tokyo_venues_csv['Venue'].str.contains("Tokyo Skytree")]
t
Out[38]:
District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID Ward No.Ward
321 Asakusa 35.717597 139.797563 Tokyo Skytree Tembo Galleria (東京スカイツリー天望回廊) 35.710132 139.81073 1451 Scenic Lookout 4fb4acd3e4b06fa0ebcb9f87 Taitō 3
869 Asakusa 35.717597 139.797563 Tokyo Skytree (東京スカイツリー) 35.710054 139.81071 1455 Monument / Landmark 4b569977f964a520551628e3 Taitō 3

The Foursquare API gives correct coordinates of Tokyo Skytree (東京スカイツリー), but wrong district. 35°42'36.2"N 139°48'38.6"E is actually "1 Chome-1-83 Oshiage, Sumida City, Tokyo 131-0045, Japan", according to Google Map.

In [39]:
TokyoSkytree = t.iloc[1][8]  #row 4, col 9
TokyoSkytree
Out[39]:
'4b569977f964a520551628e3'

- Tokyo Tower

In [41]:
tt = Tokyo_venues_csv[Tokyo_venues_csv['Venue'].str.contains("Tower")] #search in df

TokyoTower = tt.iloc[0][8] #extract
TokyoTower
Out[41]:
'4b56a5e8f964a5208e1728e3'

- Meiji Shrine

In [42]:
MJ = Tokyo_venues_csv[Tokyo_venues_csv['Venue'].str.contains("Meiji|Shrine")]
Meiji = MJ.iloc[0][8]
Meiji
Out[42]:
'4c1a916a63750f475a7fb367'

Get ratings and likes from Foursquare using Get Details of a Venue API

In [43]:
venube_id = Sensoji, TokyoSkytree,TokyoTower, Meiji
venube_idd = pd.DataFrame(venube_id, columns = ['Venue ID'])
venube_idd
Out[43]:
Venue ID
0 4b0587a1f964a5207f9d22e3
1 4b569977f964a520551628e3
2 4b56a5e8f964a5208e1728e3
3 4c1a916a63750f475a7fb367
In [44]:
# venube_idd = venube_idd.merge(Tokyo_venues_csv[['Venue ID', 'Venue']], on='VenueID')
result_m = pd.merge(venube_idd, Tokyo_venues_csv, on=["Venue ID"])
result_m
Out[44]:
Venue ID District District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Ward No.Ward
0 4b0587a1f964a5207f9d22e3 Asakusa 35.717597 139.797563 Senso-ji Temple (浅草寺) 35.714662 139.796761 334 Buddhist Temple Taitō 3
1 4b569977f964a520551628e3 Asakusa 35.717597 139.797563 Tokyo Skytree (東京スカイツリー) 35.710054 139.810710 1455 Monument / Landmark Taitō 3
2 4b56a5e8f964a5208e1728e3 Azabu 35.656402 139.733970 Tokyo Tower (東京タワー) 35.658579 139.745442 1065 Monument / Landmark Minato 0
3 4c1a916a63750f475a7fb367 Harajuku 35.668705 139.705336 Meiji Jingu Gyoen (明治神宮御苑) 35.673782 139.700319 724 Garden Shibuya 1
In [45]:
#Ratings & likes of venues
rating_df = []

for k in range(result_m.shape[0]):
    venue_id = result_m['Venue ID'][k]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()['response']
    
#     rating = result["response"]["venue"]["rating"]
#     likes = result["response"]["venue"]["likes"]['count']
#     rating_df.append(rating)
#     rating_df.append(likes)

    rating_df.append([(result['venue']['name'],
            result['venue']['rating'], 
            result['venue']['likes'].get('count'))])
    
    rate_df = pd.DataFrame([item for rating_df in rating_df for item in rating_df])
    rate_df.columns = ['Venue','Rating','Likes']
In [46]:
rate_df
Out[46]:
Venue Rating Likes
0 Senso-ji Temple (浅草寺) 8.8 2777
1 Tokyo Skytree (東京スカイツリー) 8.5 2548
2 Tokyo Tower (東京タワー) 8.8 846
3 Meiji Jingu Gyoen (明治神宮御苑) 8.8 116

- Most Common Venues

Now let's go back to the venue dataframe to find the most common venue categories in each area, by doing a quick check to see how many venues have been returned for each district.

The size of the resulting dataframe:

In [47]:
Tokyo_venues_Category = Tokyo_venues_csv.groupby(Tokyo_venues_csv['District']).count()
Tokyo_venues_Category = Tokyo_venues_Category.drop(columns=['Ward', 'No.Ward'])
Tokyo_venues_Category
Out[47]:
District Latitude District Longitude Venue Venue Latitude Venue Longitude Venue distance Venue Category Venue ID
District
Akasaka 38 38 38 38 38 38 38 38
Aoyama 24 24 24 24 24 24 24 24
Asakusa 71 71 71 71 71 71 71 71
Azabu 22 22 22 22 22 22 22 22
Daikanyama 50 50 50 50 50 50 50 50
Ebisu 30 30 30 30 30 30 30 30
Hamamatsuchō 16 16 16 16 16 16 16 16
Harajuku 48 48 48 48 48 48 48 48
Hiroo 27 27 27 27 27 27 27 27
Mita 42 42 42 42 42 42 42 42
Morishita 62 62 62 62 62 62 62 62
Roppongi 49 49 49 49 49 49 49 49
Ryōgoku 35 35 35 35 35 35 35 35
Shinbashi 65 65 65 65 65 65 65 65
Toranomon 24 24 24 24 24 24 24 24
Kinshichō 72 72 72 72 72 72 72 72
Odaiba 100 100 100 100 100 100 100 100
Shibuya 60 60 60 60 60 60 60 60
Ueno 92 92 92 92 92 92 92 92
In [48]:
print('There are {} uniques categories in 4 wards/19 districts.'.format(len(Tokyo_venues_csv['Venue Category'].unique())))
There are 181 uniques categories in 4 wards/19 districts.
In [49]:
# create a dataframe of top 15 frequent categories
Tokyo_Venues_Top15 = Tokyo_venues_csv['Venue Category'].value_counts()[0:15].to_frame(name='frequency')
Tokyo_Venues_Top15=Tokyo_Venues_Top15.reset_index()


Tokyo_Venues_Top15.rename(index=str, columns={"index": "Venue_Category", "frequency": "Frequency"}, inplace=True)
Tokyo_Venues_Top15
Out[49]:
Venue_Category Frequency
0 Coffee Shop 56
1 Japanese Restaurant 42
2 BBQ Joint 41
3 Café 39
4 Sake Bar 31
5 Sushi Restaurant 26
6 Ramen Restaurant 24
7 Chinese Restaurant 23
8 Hotel 20
9 Bakery 20
10 Soba Restaurant 19
11 Convenience Store 17
12 Wagashi Place 16
13 French Restaurant 15
14 Tonkatsu Restaurant 15
In [50]:
import seaborn as sns
from matplotlib import pyplot as plt

s=sns.barplot(x="Venue_Category", y="Frequency", data= Tokyo_Venues_Top15)
s.set_xticklabels(s.get_xticklabels(), rotation=45, horizontalalignment='right')

plt.title("15 Most Frequent Venues in the Wards" , fontsize=15)
plt.xlabel("Venue Category", fontsize=5)
plt.ylabel ("Frequency", fontsize=8)
plt.savefig("Most_Freq_Venues1.png", dpi=300,  bbox_inches = "tight")
fig = plt.figure(figsize=(18,7))
plt.tight_layout()
plt.show()
<Figure size 1296x504 with 0 Axes>

Analyze each district

How many unique venue categories in each district...

In [51]:
no_venues_each = Tokyo_venues_csv.groupby('District')['Venue Category'].nunique()
no_venues_eachh= pd.DataFrame(no_venues_each)
no_venues_eachh = no_venues_eachh.rename(columns = {"Venue Category" : "NoofCategory"}).reset_index()
In [52]:
list_ven_no =no_venues_eachh['NoofCategory'].to_list()
list_dist =no_venues_eachh['District'].to_list()
In [53]:
no_venues_eachh
Out[53]:
District NoofCategory
0 Akasaka 23
1 Aoyama 17
2 Asakusa 39
3 Azabu 22
4 Daikanyama 33
5 Ebisu 23
6 Hamamatsuchō 15
7 Harajuku 37
8 Hiroo 18
9 Mita 29
10 Morishita 39
11 Roppongi 33
12 Ryōgoku 22
13 Shinbashi 45
14 Toranomon 22
15 Kinshichō 45
16 Odaiba 54
17 Shibuya 41
18 Ueno 49
In [54]:
palett = sns.color_palette("viridis",18)

ss=sns.barplot(x="NoofCategory", y="District", data= no_venues_eachh,palette= palett)
# ss.set_xticklabels(ss.get_xticklabels())

plt.title('Number of Venue Categories in Each District', fontsize=15)
plt.xlabel("Number of Venue Categories")
plt.ylabel ("District")
plt.savefig("Unique_venues_each_district.png", dpi=300, bbox_inches = "tight")
fig = plt.figure(figsize=(18,7))
plt.show()
<Figure size 1296x504 with 0 Axes>

Analyze each ward

How many unique venue categories in each ward...

In [55]:
#How many uniqye vennues...in the 4 wards
no_venues_each_w = Tokyo_venues_csv.groupby('Ward')['Venue Category'].nunique()
no_venues_eachh_w= pd.DataFrame(no_venues_each_w)
no_venues_eachh_w = no_venues_eachh_w.rename(columns = {"Venue Category" : "NoofCategory"}).reset_index()
In [56]:
list_ven_no_w =no_venues_eachh_w['NoofCategory'].to_list()
list_ward =no_venues_eachh_w['Ward'].to_list()
In [57]:
no_venues_eachh_w
Out[57]:
Ward NoofCategory
0 Minato 124
1 Shibuya 87
2 Sumida 69
3 Taitō 74
In [58]:
no_venues_eachh_w.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Ward          4 non-null      object
 1   NoofCategory  4 non-null      int64 
dtypes: int64(1), object(1)
memory usage: 192.0+ bytes
In [59]:
palett = sns.color_palette("rocket_r")

ss=sns.barplot(x="NoofCategory", y="Ward", data= no_venues_eachh_w,palette= "coolwarm")
# ss.set_xticklabels(ss.get_xticklabels())

plt.title('Number of Venue Categories in Each Ward', fontsize=15)
plt.xlabel("Number of Venue Categories")
plt.ylabel ("Ward")
plt.savefig("Unique_venues_each_ward.png", dpi=300,  bbox_inches = "tight")
fig = plt.figure(figsize=(18,7))
plt.show()
<Figure size 1296x504 with 0 Axes>

See which districts are in the ward that has most unique venues...

In [60]:
numdist_Minato = Tokyo_venues_csv["District"][Tokyo_venues_csv["Ward"]=='Minato'].unique()

numdist_Minato.tolist()
Out[60]:
['Odaiba',
 ' Shinbashi',
 ' Aoyama',
 ' Roppongi',
 ' Mita',
 ' Akasaka',
 ' Azabu',
 ' Toranomon',
 ' Hamamatsuchō']

Machine Learning

To do prescriptive analytics to help a tourist decide a location to go, I will use K-means clustering; an unsupervised machine learning algorithm, which creates clusters of data points aggregated together based on similarities.

One hot encoding:

In [69]:
#get_dummies is a way to create dummy variables for categorical features. 
# act like 'switches' that turn various parameters on and off in an equation

tokyo_onehot = pd.get_dummies(Tokyo_venues_csv[['Venue Category']], prefix="", prefix_sep="")

# add district column back to dataframe
tokyo_onehot['District'] = Tokyo_venues_csv['District'] 

# move district column to the first column
fixed_columns = [tokyo_onehot.columns[-1]] + list(tokyo_onehot.columns[:-1])
tokyo_onehot = tokyo_onehot[fixed_columns]

tokyo_onehot[100:300]
Out[69]:
District Accessories Store American Restaurant Art Gallery Art Museum Arts & Crafts Store Athletics & Sports Australian Restaurant BBQ Joint Bagel Shop ... Tunnel Udon Restaurant Unagi Restaurant Vegetarian / Vegan Restaurant Vietnamese Restaurant Wagashi Place Wine Bar Yakitori Restaurant Yoshoku Restaurant Zoo
100 Akasaka 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
101 Harajuku 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
102 Ueno 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
103 Ebisu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
104 Ebisu 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
295 Roppongi 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
296 Harajuku 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
297 Morishita 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
298 Morishita 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
299 Mita 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0

200 rows × 182 columns

In [63]:
tokyo_onehot.shape
Out[63]:
(927, 182)

Next, let's group rows by district and by taking the mean of the frequency of occurrence of each category

In [64]:
Tokyo_grouped = tokyo_onehot.groupby('District').mean().reset_index()
Tokyo_grouped
Out[64]:
District Accessories Store American Restaurant Art Gallery Art Museum Arts & Crafts Store Athletics & Sports Australian Restaurant BBQ Joint Bagel Shop ... Tunnel Udon Restaurant Unagi Restaurant Vegetarian / Vegan Restaurant Vietnamese Restaurant Wagashi Place Wine Bar Yakitori Restaurant Yoshoku Restaurant Zoo
0 Akasaka 0.000000 0.026316 0.000000 0.000000 0.000000 0.000000 0.000000 0.105263 0.00000 ... 0.00 0.000000 0.026316 0.00000 0.000000 0.026316 0.000000 0.026316 0.000000 0.00000
1 Aoyama 0.000000 0.000000 0.000000 0.041667 0.000000 0.000000 0.000000 0.000000 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.083333 0.041667 0.000000 0.000000 0.00000
2 Asakusa 0.000000 0.000000 0.000000 0.000000 0.014085 0.014085 0.000000 0.028169 0.00000 ... 0.00 0.000000 0.028169 0.00000 0.000000 0.028169 0.000000 0.000000 0.028169 0.00000
3 Azabu 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 ... 0.00 0.045455 0.045455 0.00000 0.000000 0.000000 0.000000 0.045455 0.000000 0.00000
4 Daikanyama 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
5 Ebisu 0.000000 0.033333 0.000000 0.033333 0.000000 0.000000 0.000000 0.066667 0.00000 ... 0.00 0.066667 0.000000 0.00000 0.000000 0.033333 0.000000 0.000000 0.000000 0.00000
6 Hamamatsuchō 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.062500 0.00000 ... 0.00 0.062500 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
7 Harajuku 0.020833 0.000000 0.020833 0.020833 0.020833 0.000000 0.000000 0.000000 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.020833 0.000000 0.000000 0.000000 0.00000
8 Hiroo 0.000000 0.000000 0.000000 0.037037 0.000000 0.000000 0.000000 0.111111 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
9 Mita 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.142857 0.02381 ... 0.00 0.023810 0.000000 0.00000 0.000000 0.047619 0.000000 0.023810 0.000000 0.00000
10 Morishita 0.000000 0.000000 0.000000 0.016129 0.000000 0.000000 0.000000 0.048387 0.00000 ... 0.00 0.016129 0.000000 0.00000 0.000000 0.032258 0.016129 0.048387 0.016129 0.00000
11 Roppongi 0.000000 0.000000 0.020408 0.040816 0.000000 0.000000 0.000000 0.061224 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.020408 0.000000 0.000000 0.020408 0.000000 0.00000
12 Ryōgoku 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.028571 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.000000 0.028571 0.028571 0.000000 0.00000
13 Shinbashi 0.000000 0.000000 0.015385 0.000000 0.000000 0.000000 0.015385 0.000000 0.00000 ... 0.00 0.015385 0.000000 0.00000 0.000000 0.015385 0.000000 0.000000 0.015385 0.00000
14 Toranomon 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.041667 0.00000
15 Kinshichō 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.083333 0.00000 ... 0.00 0.013889 0.000000 0.00000 0.013889 0.013889 0.027778 0.013889 0.000000 0.00000
16 Odaiba 0.000000 0.020000 0.000000 0.010000 0.000000 0.000000 0.010000 0.000000 0.00000 ... 0.01 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
17 Shibuya 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.033333 0.00000 ... 0.00 0.000000 0.000000 0.00000 0.016667 0.000000 0.033333 0.000000 0.016667 0.00000
18 Ueno 0.000000 0.000000 0.000000 0.021739 0.000000 0.000000 0.000000 0.076087 0.00000 ... 0.00 0.021739 0.010870 0.01087 0.000000 0.032609 0.010870 0.000000 0.010870 0.01087

19 rows × 182 columns

Let's print each district along with the top 15 most common venues.

In [65]:
num_top_venues = 15

for hood in Tokyo_grouped['District']:
    print("----"+hood+"----")
    temp = Tokyo_grouped[Tokyo_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
---- Akasaka----
                  venue  freq
0                 Hotel  0.11
1             BBQ Joint  0.11
2          Dessert Shop  0.08
3           Coffee Shop  0.08
4    Kaiseki Restaurant  0.05
5                Bakery  0.05
6          Concert Hall  0.05
7    Chinese Restaurant  0.05
8     French Restaurant  0.05
9      Sushi Restaurant  0.03
10  Japanese Restaurant  0.03
11        Wagashi Place  0.03
12     Unagi Restaurant  0.03
13            Hotel Bar  0.03
14    Indian Restaurant  0.03


---- Aoyama----
                    venue  freq
0        Baseball Stadium  0.17
1       French Restaurant  0.12
2           Wagashi Place  0.08
3                    Café  0.08
4        Sushi Restaurant  0.04
5                Wine Bar  0.04
6         Thai Restaurant  0.04
7   Indonesian Restaurant  0.04
8     Tonkatsu Restaurant  0.04
9                    Park  0.04
10                 Museum  0.04
11                  Trail  0.04
12                 Garden  0.04
13                  Hotel  0.04
14          Rugby Stadium  0.04


---- Asakusa----
                  venue  freq
0           Coffee Shop  0.10
1                  Café  0.08
2   Japanese Restaurant  0.08
3                Bakery  0.06
4   Sukiyaki Restaurant  0.04
5              Sake Bar  0.04
6      Sushi Restaurant  0.03
7       Soba Restaurant  0.03
8             BBQ Joint  0.03
9         Wagashi Place  0.03
10           Steakhouse  0.03
11   Yoshoku Restaurant  0.03
12     Unagi Restaurant  0.03
13               Hostel  0.03
14       Scenic Lookout  0.03


---- Azabu----
                              venue  freq
0                Kaiseki Restaurant  0.05
1                Italian Restaurant  0.05
2                       Event Space  0.05
3                  Ramen Restaurant  0.05
4            Singaporean Restaurant  0.05
5                   Soba Restaurant  0.05
6   Southern / Soul Food Restaurant  0.05
7                     Grocery Store  0.05
8                        Playground  0.05
9                        Steakhouse  0.05
10                      Pizza Place  0.05
11               Chinese Restaurant  0.05
12              Szechuan Restaurant  0.05
13                        Bookstore  0.05
14                             Café  0.05


---- Daikanyama----
                  venue  freq
0           Coffee Shop  0.14
1          Dessert Shop  0.08
2           Pizza Place  0.06
3                   Bar  0.04
4      Ramen Restaurant  0.04
5                  Café  0.04
6       Thai Restaurant  0.04
7   Japanese Restaurant  0.04
8     French Restaurant  0.04
9          Liquor Store  0.02
10   Italian Restaurant  0.02
11       Cosmetics Shop  0.02
12     Sushi Restaurant  0.02
13           Restaurant  0.02
14        Shopping Mall  0.02


---- Ebisu----
                        venue  freq
0                 Coffee Shop  0.13
1                   BBQ Joint  0.07
2          Seafood Restaurant  0.07
3          Italian Restaurant  0.07
4             Udon Restaurant  0.07
5                      Bistro  0.03
6                Cocktail Bar  0.03
7            Sushi Restaurant  0.03
8          Chinese Restaurant  0.03
9              Ice Cream Shop  0.03
10         Dim Sum Restaurant  0.03
11           Ramen Restaurant  0.03
12            Soba Restaurant  0.03
13  Japanese Curry Restaurant  0.03
14        Tonkatsu Restaurant  0.03


---- Hamamatsuchō----
                        venue  freq
0            Sushi Restaurant  0.12
1   Japanese Curry Restaurant  0.06
2                      Garden  0.06
3            Ramen Restaurant  0.06
4             Soba Restaurant  0.06
5         Tonkatsu Restaurant  0.06
6                Burger Joint  0.06
7                       Trail  0.06
8             Buddhist Temple  0.06
9             Udon Restaurant  0.06
10                  BBQ Joint  0.06
11        Japanese Restaurant  0.06
12         Seafood Restaurant  0.06
13           Malay Restaurant  0.06
14               Noodle House  0.06


---- Harajuku----
                    venue  freq
0             Coffee Shop  0.10
1                    Café  0.08
2     Japanese Restaurant  0.08
3               Nightclub  0.04
4       Accessories Store  0.02
5      Italian Restaurant  0.02
6      Chinese Restaurant  0.02
7          Clothing Store  0.02
8             Pizza Place  0.02
9     Szechuan Restaurant  0.02
10                    Pub  0.02
11           Dessert Shop  0.02
12  Street Food Gathering  0.02
13             Steakhouse  0.02
14             Donut Shop  0.02


---- Hiroo----
                 venue  freq
0            BBQ Joint  0.11
1               Bakery  0.11
2          Coffee Shop  0.07
3    French Restaurant  0.07
4         Burger Joint  0.07
5             Sake Bar  0.07
6            Jazz Club  0.07
7                 Park  0.04
8   Italian Restaurant  0.04
9   Chinese Restaurant  0.04
10               Hotel  0.04
11      Breakfast Spot  0.04
12  Israeli Restaurant  0.04
13              Bistro  0.04
14    Sushi Restaurant  0.04


---- Mita----
                  venue  freq
0             BBQ Joint  0.14
1           Pizza Place  0.07
2   Japanese Restaurant  0.07
3         Wagashi Place  0.05
4    Italian Restaurant  0.05
5                Bistro  0.05
6                Bakery  0.05
7           Supermarket  0.02
8           Coffee Shop  0.02
9      Sushi Restaurant  0.02
10  Sukiyaki Restaurant  0.02
11      Soba Restaurant  0.02
12   Chinese Restaurant  0.02
13             Sake Bar  0.02
14            Drugstore  0.02


---- Morishita----
                  venue  freq
0           Coffee Shop  0.08
1   Japanese Restaurant  0.06
2       Soba Restaurant  0.06
3                  Café  0.05
4   Yakitori Restaurant  0.05
5             BBQ Joint  0.05
6      Ramen Restaurant  0.03
7    Tempura Restaurant  0.03
8                Hostel  0.03
9           Pizza Place  0.03
10               Bakery  0.03
11        Wagashi Place  0.03
12   Chinese Restaurant  0.03
13  Sukiyaki Restaurant  0.02
14     Tapas Restaurant  0.02


---- Roppongi----
                    venue  freq
0     Japanese Restaurant  0.06
1        Sushi Restaurant  0.06
2               BBQ Joint  0.06
3               Jazz Club  0.04
4                   Hotel  0.04
5             Coffee Shop  0.04
6               Hotel Bar  0.04
7      Chinese Restaurant  0.04
8         Soba Restaurant  0.04
9     Tonkatsu Restaurant  0.04
10             Steakhouse  0.04
11             Art Museum  0.04
12                    Bar  0.04
13  Vietnamese Restaurant  0.02
14             Restaurant  0.02


---- Ryōgoku----
                  venue  freq
0   Japanese Restaurant  0.14
1              Sake Bar  0.11
2    Chinese Restaurant  0.06
3          Burger Joint  0.06
4      Ramen Restaurant  0.06
5           Coffee Shop  0.06
6       Soba Restaurant  0.06
7              Beer Bar  0.06
8   Dumpling Restaurant  0.03
9                Bistro  0.03
10          Pizza Place  0.03
11       Chocolate Shop  0.03
12         Cupcake Shop  0.03
13      Thai Restaurant  0.03
14     Stationery Store  0.03


---- Shinbashi----
                  venue  freq
0      Sushi Restaurant  0.09
1              Sake Bar  0.06
2           Coffee Shop  0.05
3   Tonkatsu Restaurant  0.05
4   Japanese Restaurant  0.05
5                 Hotel  0.03
6    Chinese Restaurant  0.03
7          Dessert Shop  0.03
8              Boutique  0.03
9        Clothing Store  0.03
10      Soba Restaurant  0.03
11          Pizza Place  0.02
12            Hotel Bar  0.02
13               Market  0.02
14               Lounge  0.02


---- Toranomon----
                  venue  freq
0                  Café  0.08
1                 Hotel  0.08
2               Theater  0.04
3              Sake Bar  0.04
4     French Restaurant  0.04
5             Roof Deck  0.04
6            Restaurant  0.04
7                Garden  0.04
8            Steakhouse  0.04
9           Coffee Shop  0.04
10  Sukiyaki Restaurant  0.04
11       Chocolate Shop  0.04
12   Chinese Restaurant  0.04
13             Tea Room  0.04
14            Hotel Bar  0.04


----Kinshichō----
                  venue  freq
0             BBQ Joint  0.08
1              Sake Bar  0.07
2      Ramen Restaurant  0.07
3         Grocery Store  0.06
4        Ice Cream Shop  0.04
5        Discount Store  0.04
6            Steakhouse  0.04
7                  Café  0.03
8              Wine Bar  0.03
9      Sushi Restaurant  0.03
10  Dumpling Restaurant  0.03
11            Drugstore  0.03
12         Concert Hall  0.01
13          Coffee Shop  0.01
14       Clothing Store  0.01


----Odaiba----
                  venue  freq
0     Convenience Store  0.16
1                  Park  0.07
2        Discount Store  0.05
3           Coffee Shop  0.05
4                 Hotel  0.04
5               Exhibit  0.03
6      Sushi Restaurant  0.03
7        Clothing Store  0.02
8            Theme Park  0.02
9      Toy / Game Store  0.02
10                Trail  0.02
11               Buffet  0.02
12  Japanese Restaurant  0.02
13           Hot Spring  0.02
14   Seafood Restaurant  0.02


----Shibuya----
                        venue  freq
0                        Café  0.10
1                 Coffee Shop  0.08
2         Japanese Restaurant  0.05
3                         Bar  0.05
4                   Nightclub  0.03
5                   BBQ Joint  0.03
6           French Restaurant  0.03
7   Japanese Curry Restaurant  0.03
8                    Wine Bar  0.03
9                Dessert Shop  0.03
10          Indian Restaurant  0.02
11                   Sake Bar  0.02
12        Teishoku Restaurant  0.02
13                Candy Store  0.02
14                   Tea Room  0.02


----Ueno----
                  venue  freq
0             BBQ Joint  0.08
1              Sake Bar  0.07
2      Ramen Restaurant  0.07
3                  Café  0.07
4        History Museum  0.05
5   Tonkatsu Restaurant  0.05
6           Coffee Shop  0.03
7         Wagashi Place  0.03
8    Chinese Restaurant  0.03
9                  Park  0.02
10               Museum  0.02
11       Science Museum  0.02
12      Soba Restaurant  0.02
13           Hobby Shop  0.02
14           Bath House  0.02


Now let's create a dataframe and display the top 10 venues for each district.

In [66]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
In [71]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['District'] = Tokyo_grouped['District']

for ind in np.arange(Tokyo_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Tokyo_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted
Out[71]:
District 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Akasaka BBQ Joint Hotel Dessert Shop Coffee Shop French Restaurant Chinese Restaurant Bakery Concert Hall Kaiseki Restaurant Beer Bar
1 Aoyama Baseball Stadium French Restaurant Café Wagashi Place Trail Museum Park Tempura Restaurant Thai Restaurant Tonkatsu Restaurant
2 Asakusa Coffee Shop Café Japanese Restaurant Bakery Sake Bar Sukiyaki Restaurant Yoshoku Restaurant Steakhouse Soba Restaurant Scenic Lookout
3 Azabu Kaiseki Restaurant Ramen Restaurant Steakhouse Café Southern / Soul Food Restaurant Soba Restaurant Singaporean Restaurant Chinese Restaurant Playground Bookstore
4 Daikanyama Coffee Shop Dessert Shop Pizza Place Café Ramen Restaurant Japanese Restaurant French Restaurant Bar Thai Restaurant Chinese Restaurant
5 Ebisu Coffee Shop Italian Restaurant Seafood Restaurant Udon Restaurant BBQ Joint Japanese Curry Restaurant Hotel Bar Restaurant Dim Sum Restaurant Cocktail Bar
6 Hamamatsuchō Sushi Restaurant Trail Ramen Restaurant Seafood Restaurant Soba Restaurant Garden Noodle House Burger Joint Buddhist Temple Malay Restaurant
7 Harajuku Coffee Shop Japanese Restaurant Café Nightclub Clothing Store Street Food Gathering Mediterranean Restaurant Gift Shop Chinese Restaurant Garden
8 Hiroo BBQ Joint Bakery Jazz Club Sake Bar French Restaurant Coffee Shop Burger Joint Hotel Sandwich Place Japanese Restaurant
9 Mita BBQ Joint Japanese Restaurant Pizza Place Bakery Wagashi Place Italian Restaurant Bistro Lounge Sushi Restaurant Supermarket
10 Morishita Coffee Shop Soba Restaurant Japanese Restaurant BBQ Joint Yakitori Restaurant Café Chinese Restaurant Ramen Restaurant Tempura Restaurant Hostel
11 Roppongi Japanese Restaurant Sushi Restaurant BBQ Joint Hotel Steakhouse Jazz Club Tonkatsu Restaurant Bar Coffee Shop Hotel Bar
12 Ryōgoku Japanese Restaurant Sake Bar Chinese Restaurant Soba Restaurant Burger Joint Ramen Restaurant Coffee Shop Beer Bar Hostel Stationery Store
13 Shinbashi Sushi Restaurant Sake Bar Tonkatsu Restaurant Coffee Shop Japanese Restaurant Clothing Store Boutique Hotel Chinese Restaurant Soba Restaurant
14 Toranomon Café Hotel Restaurant Shabu-Shabu Restaurant Garden Steakhouse Chocolate Shop Sukiyaki Restaurant French Restaurant Coffee Shop
15 Kinshichō BBQ Joint Ramen Restaurant Sake Bar Grocery Store Ice Cream Shop Discount Store Steakhouse Café Dumpling Restaurant Sushi Restaurant
16 Odaiba Convenience Store Park Discount Store Coffee Shop Hotel Exhibit Sushi Restaurant Buffet Seafood Restaurant Clothing Store
17 Shibuya Café Coffee Shop Bar Japanese Restaurant Wine Bar Dessert Shop Nightclub BBQ Joint Japanese Curry Restaurant French Restaurant
18 Ueno BBQ Joint Ramen Restaurant Sake Bar Café History Museum Tonkatsu Restaurant Coffee Shop Wagashi Place Chinese Restaurant Hobby Shop

- K-Means Clustering

After using one hot encoding and taking the mean of the frequency for each venue category, let's use K-means clustering to create K clusters of data points based on similarities.

First, to implement this algorithm, it is very important to determine the optimal number of clusters (aka k).

Elbow method:

In [87]:
Tokyo_grouped_clustering = Tokyo_grouped.drop('District', 1)
In [89]:
# determine k using elbow method
from sklearn.cluster import KMeans
from sklearn import metrics
from scipy.spatial.distance import cdist
import numpy as np
import matplotlib.pyplot as plt

# k means determine k
distortions = []
K = range(1,11)
for k in K:
    kmeanModel = KMeans(n_clusters=k).fit(Tokyo_grouped_clustering)
    kmeanModel.fit(Tokyo_grouped_clustering)
    distortions.append(sum(np.min(cdist(Tokyo_grouped_clustering, kmeanModel.cluster_centers_, 'euclidean'), axis=1)) / Tokyo_grouped_clustering.shape[0])

# Plot the elbow
plt.plot(K, distortions, 'bx-')

plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method Showing the Optimal k')
plt.show()

Sometimes, Elbow method does not give the required result, which happened in this case.

Let's try a different method of finding the best value for k.

Silhouette_score method:

In [107]:
from sklearn.metrics import silhouette_score

sil = []
k_sil = range(2,11)
#min of 2 clst req to define dissim.
for k in k_sil:
    print(k, end=" ")
    kmeans = KMeans(n_clusters = k).fit(Tokyo_grouped_clustering)
    labels = kmeans.labels_
    sil.append(silhouette_score(Tokyo_grouped_clustering,labels, metric = 'euclidean'))
2 3 4 5 6 7 8 9 10 
In [109]:
plt.plot(k_sil, sil, 'bo-')
plt.xlabel('k')
plt.ylabel('silhouette_score')
plt.title('Silhouette Method Showing the Optimal k')
plt.show()

There is a peak at k = 3. However, three number of clusters will cluster the districts very broadly.

Therefore, in this case, the number of clusters (i.e. ‘k’) is chosen to be 4.

Run k-means to cluster the districts into 4 clusters.

In [129]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Tokyo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
Out[129]:
array([0, 2, 3, 0, 3, 3, 1, 3, 0, 0], dtype=int32)
In [130]:
k_means_labels = kmeans.labels_
k_means_labels
Out[130]:
array([0, 2, 3, 0, 3, 3, 1, 3, 0, 0, 3, 0, 0, 0, 3, 0, 0, 3, 0],
      dtype=int32)
In [131]:
k_means_cluster_centers = kmeans.cluster_centers_

Create a new dataframe that includes the cluster column as well as the top 10 venues for each district.

In [134]:
# Add clustering labels

# neighborhoods_venues_sorted.drop(columns=['Cluster Labels'], inplace=True) #for re-run
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
# tokyo_mergedd = Tokyo_venues_csv

# merge with df_tokyo to add latitude/longitude for each district
# to obtain the final result
tokyo_mergedd = df_tokyo.join(neighborhoods_venues_sorted.set_index('District'), on='District')
tokyo_mergedd['Cluster Labels'] = tokyo_mergedd['Cluster Labels'].fillna(0)
tokyo_mergedd['Cluster Labels'] = tokyo_mergedd['Cluster Labels'].astype(int)




tokyo_mergedd = tokyo_mergedd.drop(columns=[ 'Density'])

tokyo_mergedd.head() # check the last columns!
Out[134]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Taitō Ueno 35.713376 139.776656 0 BBQ Joint Ramen Restaurant Sake Bar Café History Museum Tonkatsu Restaurant Coffee Shop Wagashi Place Chinese Restaurant Hobby Shop
1 Taitō Asakusa 35.717597 139.797563 3 Coffee Shop Café Japanese Restaurant Bakery Sake Bar Sukiyaki Restaurant Yoshoku Restaurant Steakhouse Soba Restaurant Scenic Lookout
2 Shibuya Shibuya 35.664596 139.698711 3 Café Coffee Shop Bar Japanese Restaurant Wine Bar Dessert Shop Nightclub BBQ Joint Japanese Curry Restaurant French Restaurant
3 Shibuya Ebisu 35.646438 139.710210 3 Coffee Shop Italian Restaurant Seafood Restaurant Udon Restaurant BBQ Joint Japanese Curry Restaurant Hotel Bar Restaurant Dim Sum Restaurant Cocktail Bar
4 Shibuya Harajuku 35.668705 139.705336 3 Coffee Shop Japanese Restaurant Café Nightclub Clothing Store Street Food Gathering Mediterranean Restaurant Gift Shop Chinese Restaurant Garden
In [135]:
tokyo_mergedd.to_csv('tokyo_mergedd_with_labels.csv',index=False)
In [136]:
tokyo_mergedd = pd.read_csv('tokyo_mergedd_with_labels.csv')
In [137]:
#Find null or NaN rows
tokyo_mergedd[tokyo_mergedd.isna().any(axis=1)]
Out[137]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue

Finally, let's visualize the resulting clusters

In [138]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]



# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tokyo_mergedd['District Latitude'], tokyo_mergedd['District Longitude'], tokyo_mergedd['District'], tokyo_mergedd['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)

#Display the attrictions
    # add markers to map
for lat, lng, label in zip(df['Attraction Latitude'], df['Attraction Longitude'], df['Attraction']):
    label = folium.Popup(label, parse_html=True)
    folium.Marker(
    [lat, lng],
    popup=label,
    icon = folium.Icon(color='cadetblue',icon = 'heart',prefix='glyphicon')
    ).add_to(map_clusters) 
       
map_clusters
Out[138]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [139]:
map_clusters.save('visualize clusters.html')

- Examine the Clusters

Let's examine each cluster and determine the discriminating venue categories that distinguish each cluster.

Cluster 1

In [140]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 0, tokyo_mergedd.columns[[0] + list(range(1, tokyo_mergedd.shape[1]))]]
Out[140]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Taitō Ueno 35.713376 139.776656 0 BBQ Joint Ramen Restaurant Sake Bar Café History Museum Tonkatsu Restaurant Coffee Shop Wagashi Place Chinese Restaurant Hobby Shop
6 Shibuya Hiroo 35.650500 139.717300 0 BBQ Joint Bakery Jazz Club Sake Bar French Restaurant Coffee Shop Burger Joint Hotel Sandwich Place Japanese Restaurant
7 Sumida Kinshichō 35.696312 139.815043 0 BBQ Joint Ramen Restaurant Sake Bar Grocery Store Ice Cream Shop Discount Store Steakhouse Café Dumpling Restaurant Sushi Restaurant
9 Sumida Ryōgoku 35.696854 139.797428 0 Japanese Restaurant Sake Bar Chinese Restaurant Soba Restaurant Burger Joint Ramen Restaurant Coffee Shop Beer Bar Hostel Stationery Store
10 Minato Odaiba 35.619050 139.779364 0 Convenience Store Park Discount Store Coffee Shop Hotel Exhibit Sushi Restaurant Buffet Seafood Restaurant Clothing Store
11 Minato Shinbashi 35.665106 139.756116 0 Sushi Restaurant Sake Bar Tonkatsu Restaurant Coffee Shop Japanese Restaurant Clothing Store Boutique Hotel Chinese Restaurant Soba Restaurant
13 Minato Mita 35.647200 139.740900 0 BBQ Joint Japanese Restaurant Pizza Place Bakery Wagashi Place Italian Restaurant Bistro Lounge Sushi Restaurant Supermarket
14 Minato Roppongi 35.662457 139.733498 0 Japanese Restaurant Sushi Restaurant BBQ Joint Hotel Steakhouse Jazz Club Tonkatsu Restaurant Bar Coffee Shop Hotel Bar
17 Minato Azabu 35.656402 139.733970 0 Kaiseki Restaurant Ramen Restaurant Steakhouse Café Southern / Soul Food Restaurant Soba Restaurant Singaporean Restaurant Chinese Restaurant Playground Bookstore
18 Minato Akasaka 35.671679 139.735622 0 BBQ Joint Hotel Dessert Shop Coffee Shop French Restaurant Chinese Restaurant Bakery Concert Hall Kaiseki Restaurant Beer Bar
In [141]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 0]['1st Most Common Venue'].value_counts()
Out[141]:
BBQ Joint              5
Japanese Restaurant    2
Kaiseki Restaurant     1
Sushi Restaurant       1
Convenience Store      1
Name: 1st Most Common Venue, dtype: int64

Cluster 2

In [142]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 1, tokyo_mergedd.columns[[0] + list(range(1, tokyo_mergedd.shape[1]))]]
Out[142]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
12 Minato Hamamatsuchō 35.655111 139.757062 1 Sushi Restaurant Trail Ramen Restaurant Seafood Restaurant Soba Restaurant Garden Noodle House Burger Joint Buddhist Temple Malay Restaurant
In [143]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 1]['1st Most Common Venue'].value_counts()
Out[143]:
Sushi Restaurant    1
Name: 1st Most Common Venue, dtype: int64

Cluster 3

In [144]:
Cluster3 = tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 2, tokyo_mergedd.columns[[0] + list(range(1, tokyo_mergedd.shape[1]))]]
Cluster3
Out[144]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
16 Minato Aoyama 35.672 139.723 2 Baseball Stadium French Restaurant Café Wagashi Place Trail Museum Park Tempura Restaurant Thai Restaurant Tonkatsu Restaurant
In [145]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 2]['1st Most Common Venue'].value_counts()
Out[145]:
Baseball Stadium    1
Name: 1st Most Common Venue, dtype: int64

Cluster 4

In [146]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 3, tokyo_mergedd.columns[[0] + list(range(1, tokyo_mergedd.shape[1]))]]
Out[146]:
Ward District District Latitude District Longitude Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
1 Taitō Asakusa 35.717597 139.797563 3 Coffee Shop Café Japanese Restaurant Bakery Sake Bar Sukiyaki Restaurant Yoshoku Restaurant Steakhouse Soba Restaurant Scenic Lookout
2 Shibuya Shibuya 35.664596 139.698711 3 Café Coffee Shop Bar Japanese Restaurant Wine Bar Dessert Shop Nightclub BBQ Joint Japanese Curry Restaurant French Restaurant
3 Shibuya Ebisu 35.646438 139.710210 3 Coffee Shop Italian Restaurant Seafood Restaurant Udon Restaurant BBQ Joint Japanese Curry Restaurant Hotel Bar Restaurant Dim Sum Restaurant Cocktail Bar
4 Shibuya Harajuku 35.668705 139.705336 3 Coffee Shop Japanese Restaurant Café Nightclub Clothing Store Street Food Gathering Mediterranean Restaurant Gift Shop Chinese Restaurant Garden
5 Shibuya Daikanyama 35.648157 139.703293 3 Coffee Shop Dessert Shop Pizza Place Café Ramen Restaurant Japanese Restaurant French Restaurant Bar Thai Restaurant Chinese Restaurant
8 Sumida Morishita 35.687998 139.797044 3 Coffee Shop Soba Restaurant Japanese Restaurant BBQ Joint Yakitori Restaurant Café Chinese Restaurant Ramen Restaurant Tempura Restaurant Hostel
15 Minato Toranomon 35.670187 139.750056 3 Café Hotel Restaurant Shabu-Shabu Restaurant Garden Steakhouse Chocolate Shop Sukiyaki Restaurant French Restaurant Coffee Shop
In [147]:
tokyo_mergedd.loc[tokyo_mergedd['Cluster Labels'] == 3]['1st Most Common Venue'].value_counts()
Out[147]:
Coffee Shop    5
Café           2
Name: 1st Most Common Venue, dtype: int64

Let's do clustering based on wards

In [10]:
# one hot encoding
tokyo_onehot = pd.get_dummies(Tokyo_venues_csv[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tokyo_onehot['Ward'] = Tokyo_venues_csv['Ward'] 

# move neighborhood column to the first column
fixed_columns = [tokyo_onehot.columns[-1]] + list(tokyo_onehot.columns[:-1])
tokyo_onehot = tokyo_onehot[fixed_columns]

tokyo_onehot[100:300]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-82be3f4b4804> in <module>
      1 # one hot encoding
----> 2 tokyo_onehot = pd.get_dummies(Tokyo_venues_csv[['Venue Category']], prefix="", prefix_sep="")
      3 
      4 # add neighborhood column back to dataframe
      5 tokyo_onehot['Ward'] = Tokyo_venues_csv['Ward']

NameError: name 'Tokyo_venues_csv' is not defined
In [237]:
Tokyo_grouped2 = tokyo_onehot.groupby('Ward').mean().reset_index()
Tokyo_grouped2
Out[237]:
Ward American Restaurant Aquarium Art Gallery Art Museum Arts & Crafts Store Auditorium Australian Restaurant BBQ Joint Baby Store ... Tunnel Udon Restaurant Unagi Restaurant Vegetarian / Vegan Restaurant Vietnamese Restaurant Wagashi Place Wine Bar Yakitori Restaurant Yoshoku Restaurant Zoo
0 Minato 0.007463 0.000000 0.004975 0.014925 0.000000 0.002488 0.004975 0.037313 0.000000 ... 0.002488 0.012438 0.004975 0.000000 0.000000 0.014925 0.002488 0.007463 0.012438 0.000000
1 Shibuya 0.004608 0.000000 0.004608 0.018433 0.004608 0.000000 0.000000 0.027650 0.000000 ... 0.000000 0.013825 0.004608 0.000000 0.004608 0.009217 0.009217 0.009217 0.004608 0.000000
2 Sumida 0.005525 0.000000 0.000000 0.005525 0.000000 0.000000 0.000000 0.082873 0.005525 ... 0.000000 0.011050 0.000000 0.000000 0.005525 0.011050 0.000000 0.022099 0.016575 0.000000
3 Taitō 0.000000 0.006024 0.000000 0.018072 0.012048 0.000000 0.000000 0.054217 0.000000 ... 0.000000 0.006024 0.006024 0.006024 0.000000 0.036145 0.000000 0.000000 0.030120 0.006024

4 rows × 185 columns

In [238]:
num_top_venues = 25

for hood in Tokyo_grouped2['Ward']:
    print("----"+hood+"----")
    temp = Tokyo_grouped2[Tokyo_grouped2['Ward'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
----Minato----
                  venue  freq
0                  Café  0.04
1             BBQ Joint  0.04
2                 Hotel  0.04
3                Bakery  0.03
4   Japanese Restaurant  0.03
5    Italian Restaurant  0.03
6           Coffee Shop  0.03
7      Sushi Restaurant  0.03
8                  Park  0.03
9     French Restaurant  0.03
10    Convenience Store  0.02
11      Soba Restaurant  0.02
12   Chinese Restaurant  0.02
13     Ramen Restaurant  0.02
14   Seafood Restaurant  0.02
15         Dessert Shop  0.02
16         Concert Hall  0.01
17           Steakhouse  0.01
18   Kaiseki Restaurant  0.01
19          Supermarket  0.01
20  American Restaurant  0.01
21       Discount Store  0.01
22          Pizza Place  0.01
23              Exhibit  0.01
24            Hotel Bar  0.01


----Shibuya----
                        venue  freq
0                 Coffee Shop  0.10
1                        Café  0.06
2         Japanese Restaurant  0.05
3            Ramen Restaurant  0.04
4           French Restaurant  0.03
5                   BBQ Joint  0.03
6          Italian Restaurant  0.03
7                      Bakery  0.03
8                         Bar  0.02
9          Chinese Restaurant  0.02
10                  Bookstore  0.02
11  Japanese Curry Restaurant  0.02
12                 Art Museum  0.02
13                   Sake Bar  0.02
14                  Nightclub  0.02
15            Soba Restaurant  0.01
16               Concert Hall  0.01
17           Sushi Restaurant  0.01
18               Cocktail Bar  0.01
19             Clothing Store  0.01
20               Dessert Shop  0.01
21                   Tea Room  0.01
22                Pizza Place  0.01
23                       Park  0.01
24            Thai Restaurant  0.01


----Sumida----
                      venue  freq
0          Ramen Restaurant  0.08
1                 BBQ Joint  0.08
2                  Sake Bar  0.07
3       Japanese Restaurant  0.06
4               Coffee Shop  0.04
5           Soba Restaurant  0.03
6       Dumpling Restaurant  0.03
7                      Café  0.03
8        Chinese Restaurant  0.03
9        Italian Restaurant  0.02
10                 Beer Bar  0.02
11                   Hostel  0.02
12       Tempura Restaurant  0.02
13        Indian Restaurant  0.02
14                   Bakery  0.02
15            Deli / Bodega  0.02
16          Thai Restaurant  0.02
17      Yakitori Restaurant  0.02
18       Yoshoku Restaurant  0.02
19                Gastropub  0.01
20                   Garden  0.01
21  South Indian Restaurant  0.01
22      American Restaurant  0.01
23             Gourmet Shop  0.01
24              Sports Club  0.01


----Taitō----
                    venue  freq
0        Ramen Restaurant  0.06
1                Sake Bar  0.06
2                    Café  0.06
3             Coffee Shop  0.05
4               BBQ Joint  0.05
5         Soba Restaurant  0.04
6           Wagashi Place  0.04
7     Tonkatsu Restaurant  0.03
8      Yoshoku Restaurant  0.03
9      Chinese Restaurant  0.03
10    Japanese Restaurant  0.03
11             Hobby Shop  0.02
12         History Museum  0.02
13           Dessert Shop  0.02
14                 Bakery  0.02
15             Art Museum  0.02
16  Korean BBQ Restaurant  0.01
17         Massage Studio  0.01
18           Liquor Store  0.01
19    Monument / Landmark  0.01
20               Aquarium  0.01
21  Kushikatsu Restaurant  0.01
22     Italian Restaurant  0.01
23       Kebab Restaurant  0.01
24     Kaiseki Restaurant  0.01


In [239]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Ward']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted2 = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted2['Ward'] = Tokyo_grouped2['Ward']

for ind in np.arange(Tokyo_grouped2.shape[0]):
    neighborhoods_venues_sorted2.iloc[ind, 1:] = return_most_common_venues(Tokyo_grouped2.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted2
Out[239]:
Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Minato Hotel Café BBQ Joint Japanese Restaurant Italian Restaurant French Restaurant Sushi Restaurant Park Coffee Shop Bakery
1 Shibuya Coffee Shop Café Japanese Restaurant Ramen Restaurant French Restaurant Italian Restaurant Bakery BBQ Joint Chinese Restaurant Japanese Curry Restaurant
2 Sumida BBQ Joint Ramen Restaurant Sake Bar Japanese Restaurant Coffee Shop Soba Restaurant Chinese Restaurant Dumpling Restaurant Café Thai Restaurant
3 Taitō Ramen Restaurant Café Sake Bar BBQ Joint Coffee Shop Wagashi Place Soba Restaurant Tonkatsu Restaurant Japanese Restaurant Yoshoku Restaurant

As there are only 4 wards, they could be clustered into 4 different clusters without using K-Means since it is actually a concentrate of the information already obtained, as seen in above dataframe. However, applying K-Means may allow to affirm the results.

In [240]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 4

Tokyo_grouped2 = Tokyo_grouped2.drop('Ward', 1)

# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(Tokyo_grouped2)

# check cluster labels generated for each row in the dataframe
kmeans2.labels_[0:10]
Out[240]:
array([0, 2, 3, 1], dtype=int32)
In [250]:
# Add clustering labels

# neighborhoods_venues_sorted2.drop(columns=['Cluster Labels'], inplace=True) #for re-run
neighborhoods_venues_sorted2.insert(0, 'Cluster Labels', kmeans2.labels_)
tokyo_mergedd2 = neighborhoods_venues_sorted2

# tokyo_mergedd2 = df_tokyo.join(neighborhoods_venues_sorted2.set_index('Ward'), on='Ward')
tokyo_mergedd2['Cluster Labels'] = tokyo_mergedd2['Cluster Labels'].fillna(0)
tokyo_mergedd2['Cluster Labels'] = tokyo_mergedd2['Cluster Labels'].astype(int)



tokyo_mergedd2 # check the last columns!
Out[250]:
Cluster Labels Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 0 Minato Hotel Café BBQ Joint Japanese Restaurant Italian Restaurant French Restaurant Sushi Restaurant Park Coffee Shop Bakery
1 2 Shibuya Coffee Shop Café Japanese Restaurant Ramen Restaurant French Restaurant Italian Restaurant Bakery BBQ Joint Chinese Restaurant Japanese Curry Restaurant
2 3 Sumida BBQ Joint Ramen Restaurant Sake Bar Japanese Restaurant Coffee Shop Soba Restaurant Chinese Restaurant Dumpling Restaurant Café Thai Restaurant
3 1 Taitō Ramen Restaurant Café Sake Bar BBQ Joint Coffee Shop Wagashi Place Soba Restaurant Tonkatsu Restaurant Japanese Restaurant Yoshoku Restaurant
In [251]:
tokyo_mergedd2.loc[tokyo_mergedd2['Cluster Labels'] == 0, tokyo_mergedd2.columns[[0] + list(range(1, tokyo_mergedd2.shape[1]))]]
Out[251]:
Cluster Labels Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 0 Minato Hotel Café BBQ Joint Japanese Restaurant Italian Restaurant French Restaurant Sushi Restaurant Park Coffee Shop Bakery
In [252]:
 tokyo_mergedd2.loc[tokyo_mergedd2['Cluster Labels'] == 1, tokyo_mergedd2.columns[[0] + list(range(1, tokyo_mergedd2.shape[1]))]]
Out[252]:
Cluster Labels Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
3 1 Taitō Ramen Restaurant Café Sake Bar BBQ Joint Coffee Shop Wagashi Place Soba Restaurant Tonkatsu Restaurant Japanese Restaurant Yoshoku Restaurant
In [253]:
tokyo_mergedd2.loc[tokyo_mergedd2['Cluster Labels'] == 2, tokyo_mergedd2.columns[[0] + list(range(1, tokyo_mergedd2.shape[1]))]]
Out[253]:
Cluster Labels Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
1 2 Shibuya Coffee Shop Café Japanese Restaurant Ramen Restaurant French Restaurant Italian Restaurant Bakery BBQ Joint Chinese Restaurant Japanese Curry Restaurant
In [254]:
tokyo_mergedd2.loc[tokyo_mergedd2['Cluster Labels'] == 3, tokyo_mergedd2.columns[[0] + list(range(1, tokyo_mergedd2.shape[1]))]]
Out[254]:
Cluster Labels Ward 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
2 3 Sumida BBQ Joint Ramen Restaurant Sake Bar Japanese Restaurant Coffee Shop Soba Restaurant Chinese Restaurant Dumpling Restaurant Café Thai Restaurant
In [ ]:
 

Result and Discussion

As analyzing, we were able to identify 4 major tourism wards or neighborhoods and cluster a total of 184 venue categories in the wards with 19 major districts into 4 clusters. The clusters can be generalized as the following:

Cluster 1 — 7 districts, The most common venues are restaurants/bbq joints/bar.
Cluster 2 — 6 districts, The most common venues are restaurants/coffee shops/park.
Cluster 3 — 5 districts, The most common venues are restaurants/museum/hotel.
Cluster 4 — 1 district, The most common venue is restaurant.

If we cluster the venues by wards, the clusters can be generalized as the following:

Cluster 1 — Minato; The most common venue is hotel.
Cluster 2 — Taitō; The most common venue is ramen Restaurant.
Cluster 3 — Shibuya; The most common venue is coffee Shop.
Cluster 4 — Sumida; The most common venue is BBQ Joint.

According to the results, it showed that the majority of venues in Tokyo’s popular areas are eateries. However, the model also showed different most common activities to participate in once a user in Tokyo.

According to the above a

Conclusion We explored how machine learning can help the tourism industry, how the modern day tourism industry uses machine learning, How to build a simple recommendation and suggestion system. The tourism industry is in shambles at the moment of COVID-19 paramedic, but once the paramedic situation gets better, it will most likely jump back to the high numbers. Machine learning will be right there, helping companies stay afloat and generate growth.

Conclusion

It is no doubt that Seattle is a coffee-saturated city. Based on Four Square API search result, we can see the coffee shop venue category is always the most common venue in different neighborhoods.

Among of all neighborhoods, Northgate is the probably the best neighborhood to start a food delivery business. According to the findings, the top six of the most common venues are related to food or drinks with different varieties. Those venues categories are Sushi Restaurant, Sandwich Place, Mexican Restaurant, Thai Restaurant, Coffee shop and Pizza place. Hence, the chances of getting food order delivery are very high.

From the map of K-means clustering result, the three neighborhoods, Pioneer Square, Downtown and Belltown are very close to each other geographically. The most common venues category in those areas is the Coffee shop, Italian Restaurant, Sushi Restaurant, and American Restaurant, etc. Hence, this special condensed area is definitely a good option for the office location of the food delivery service other than Northgate.

This result of this analysis has some limitations. It cannot reflect the behavior of the customer in that area. People like to visit the restaurants or stop by the coffee shops does not mean they want food delivery services. The customers may enjoy the service and the time of being there, more than just having the food at their own places. Rental cost is not taken into consideration too. The three neighborhoods Pioneer Square, Downtown and Belltown may be the most expensive rental area in Seattle